Revision as of 23:08, 14 October 2017

Basics

Fuel is a library for creating machine learning data pipelines. There are multiple features that make it really convenient.

Find fuel on Github here: https://github.com/mila-udem/fuel

Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html

Prerequisites

Fuel uses HDF5, so you will need a copy of HDF5 header files installed locally. Use your package manager, or follow HDF5 installation instructions. On a Mac:

$ brew install hdf5

Now you can install Fuel.

Install Fuel from Source

$ git clone git@github.com:/mila-udem/fuel.git
$ cd fuel
$ python setup.py build 
$ python setup.py install

Basic Usage

See Fuel/Usage

Summary:

Datasets are the principal interface to data, but are abstract classes
IterableDatasets allow sequential access to data in specified order only
IndexableDatasets allow random access to data
Schemes allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)

Wrapping Custom Datasets with Fuel

Main page: Fuel/Custom Datasets

Basically, the process of wrapping a custom data set with fuel looks like this:

Specify how the original data should be downloaded, processed, and turned into a fuel data set
Specify how the fuel data set should be loaded

The first step - defining how to turn original data into fuel data:

Create a download wrapper - this tells fuel how to download the original data ("briq" download?)
Define a way to load a single piece of data (e.g., parameterized by name) and, optionally, paired/related pieces of data (e.g., two related images)
Convert function to extract all data and assemble it all into an HDF5 file (and remove original data when finished)

The second step - specifying how the fuel data set should be loaded:

Create a fuel Datasets object (inheriting from, e.g., H5PYDataset)
Define a way for that data to be loaded (example: make a universally-available load_data method in a package specific to your data set, as in lfw_fuel)

Flags

@@ Line 38: / Line 38: @@
 =Wrapping Custom Datasets with Fuel=
-Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel
+Main page: [[Fuel/Custom Datasets]]
-Advantages:
+Basically, the process of wrapping a custom data set with fuel looks like this:
-* Only takes one command to download the data and import it into fuel
+* Specify how the original data should be downloaded, processed, and turned into a fuel data set
-* Then it only takes one command to import the library that wraps the data, and be able to turn it into training/testing X and Y
+* Specify how the fuel data set should be loaded
-Disadvantages:
+The first step - defining how to turn original data into fuel data:
-* One-size-fits-all; importing data using load_data() can take a REALLY long time, and must be done every time you run the script (not persistent in memory)
+* Create a download wrapper - this tells fuel how to download the original data ("briq" download?)
-* Complicated to extend
+* Define a way to load a single piece of data (e.g., parameterized by name) and, optionally, paired/related pieces of data (e.g., two related images)
-* Removes some of the nicer options of fuel
+* Convert function to extract all data and assemble it all into an HDF5 file (and remove original data when finished)
-Here is what the final payoff looks like:
+The second step - specifying how the fuel data set should be loaded:
+* Create a fuel Datasets object (inheriting from, e.g., H5PYDataset)
-<pre>
+* Define a way for that data to be loaded (example: make a universally-available load_data method in a package specific to your data set, as in lfw_fuel)
-from keras.models import Sequential
-from lfw_fuel import lfw
-# the data, shuffled and split between train and test sets
-(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")
-# (build the perfect model here)
-model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
-score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)
-</pre>
 =Flags=

Fuel: Difference between revisions

From charlesreid1