Fuel/Custom Datasets: Difference between revisions
From charlesreid1
(Created page with "==Wrapping Custom Datasets with Fuel== Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel Advantages: * Only t...") |
No edit summary |
||
| Line 1: | Line 1: | ||
=Wrapping Custom Datasets with Fuel= | |||
Procedure: | |||
* Define a [https://github.com/mila-udem/fuel/tree/master/fuel/datasets fuel dataset] (basically a stub class) | |||
* Define a [https://github.com/mila-udem/fuel/tree/master/fuel/downloaders fuel downloader] (a way of obtaining the data - could be locally available, since you already have it) | |||
* Define a [https://github.com/mila-udem/fuel/tree/master/fuel/converters fuel converter] (something that will iterate through the data and add it to an HDF5 file, similar to above code snippet) | |||
Once you've defined those, you can run through the procedure of bringing your data set into fuel: | |||
* Run <code>fuel-download <name-of-dataset></code> to download the data | |||
* Run <code>fuel-convert <name-of-dataset></code> to convert the data into fuel format | |||
Now you'll be able to do a simple import to get your data into your Python code (may take a while): | |||
<pre> | |||
from fuel.datasets.billion import OneBillionWord | |||
</pre> | |||
=Examples= | |||
==LFW== | |||
Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel | Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel | ||
Revision as of 02:53, 15 October 2017
Wrapping Custom Datasets with Fuel
Procedure:
- Define a fuel dataset (basically a stub class)
- Define a fuel downloader (a way of obtaining the data - could be locally available, since you already have it)
- Define a fuel converter (something that will iterate through the data and add it to an HDF5 file, similar to above code snippet)
Once you've defined those, you can run through the procedure of bringing your data set into fuel:
- Run
fuel-download <name-of-dataset>to download the data - Run
fuel-convert <name-of-dataset>to convert the data into fuel format
Now you'll be able to do a simple import to get your data into your Python code (may take a while):
from fuel.datasets.billion import OneBillionWord
Examples
LFW
Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel
Advantages:
- Only takes one command to download the data and import it into fuel
- Then it only takes one command to import the library that wraps the data, and be able to turn it into training/testing X and Y
Disadvantages:
- One-size-fits-all; importing data using load_data() can take a REALLY long time, and must be done every time you run the script (not persistent in memory)
- Complicated to extend
- Removes some of the nicer options of fuel
Here is what the final payoff looks like:
from keras.models import Sequential from lfw_fuel import lfw # the data, shuffled and split between train and test sets (X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled") # (build the perfect model here) model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test)) score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)
Flags
| fuel fuel is a package for automatic loading of data for machine learning and neural networks
Basic usage and Fuel classes: Fuel/Usage Loading custom datasets with fuel: Fuel/Custom Datasets
Category:Fuel · Category:Data Engineering
|