Latest revision as of 21:43, 15 October 2017

Basics

Fuel is a library for creating machine learning data pipelines. There are multiple features that make it really convenient.

Find fuel on Github here: https://github.com/mila-udem/fuel

Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html

Prerequisites

Fuel uses HDF5, so you will need a copy of HDF5 header files installed locally. Use your package manager, or follow HDF5 installation instructions. On a Mac:

$ brew install hdf5

Now you can install Fuel.

Install Fuel from Source

$ git clone git@github.com:/mila-udem/fuel.git
$ cd fuel
$ python setup.py build 
$ python setup.py install

Basic Usage

Main article: Fuel/Usage

Summary:

Datasets are the principal interface to data, but are abstract classes
IterableDatasets (less powerful) allow sequential access to data in specified order only
IndexableDatasets (more powerful) allow random access to data
Schemes allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)

Wrapping Custom Datasets with Fuel

Main article: Fuel/Custom Datasets

Basically, the process of wrapping a custom data set with fuel looks like this:

Specify how the original data should be downloaded, processed, and turned into a fuel data set
Specify how the fuel data set should be loaded

The first step - defining how to turn original data into fuel data:

Create a download wrapper - this tells fuel how to download the original data ("briq" download?)
Define a way to load a single piece of data (e.g., parameterized by name) and, optionally, paired/related pieces of data (e.g., two related images)
Convert function to extract all data and assemble it all into an HDF5 file (and remove original data when finished)

The second step - specifying how the fuel data set should be loaded:

Create a fuel Datasets object (inheriting from, e.g., H5PYDataset)
Define a way for that data to be loaded (example: make a universally-available load_data method in a package specific to your data set, as in lfw_fuel)

Flags

@@ Line 4: / Line 4: @@
 Find fuel on Github here: https://github.com/mila-udem/fuel
+Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html
 ==Prerequisites==
@@ Line 15: / Line 17: @@
 Now you can install Fuel.
-==Install==
+==Install Fuel from Source==
 <pre>
 $ git clone git@github.com:/mila-udem/fuel.git
 $ cd fuel
-$ python setup.py build && python setup.py install
+$ python setup.py build
+$ python setup.py install
 </pre>
-==Wrapping Custom Datasets with Fuel==
+=Basic Usage=
+{{Main|Fuel/Usage}}
+Summary:
+* [[Fuel/Usage#Datasets|Datasets]] are the principal interface to data, but are abstract classes
+* [[Fuel/Usage#IterableDataset Example|IterableDatasets]] (less powerful) allow sequential access to data in specified order only
+* [[Fuel/Usage#IndexableDataset Example|IndexableDatasets]] (more powerful) allow random access to data
+* [[Fuel/Usage#Iteration Schemes|Schemes]] allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)
-Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel
+=Wrapping Custom Datasets with Fuel=
-<pre>
+{{Main|Fuel/Custom Datasets}}
-from keras.models import Sequential
-from lfw_fuel import lfw
+Basically, the process of wrapping a custom data set with fuel looks like this:
+* Specify how the original data should be downloaded, processed, and turned into a fuel data set
+* Specify how the fuel data set should be loaded
+The first step - defining how to turn original data into fuel data:
+* Create a download wrapper - this tells fuel how to download the original data ("briq" download?)
+* Define a way to load a single piece of data (e.g., parameterized by name) and, optionally, paired/related pieces of data (e.g., two related images)
+* Convert function to extract all data and assemble it all into an HDF5 file (and remove original data when finished)
-# the data, shuffled and split between train and test sets
+The second step - specifying how the fuel data set should be loaded:
-(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")
+* Create a fuel Datasets object (inheriting from, e.g., H5PYDataset)
+* Define a way for that data to be loaded (example: make a universally-available load_data method in a package specific to your data set, as in lfw_fuel)
-# (build the perfect model here)
+=Flags=
-model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
+{{FuelFlag}}
-score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)
-</pre>

Fuel: Difference between revisions

From charlesreid1