From charlesreid1

 
(23 intermediate revisions by the same user not shown)
Line 4: Line 4:


Find fuel on Github here: https://github.com/mila-udem/fuel
Find fuel on Github here: https://github.com/mila-udem/fuel
Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html


==Prerequisites==
==Prerequisites==
Line 15: Line 17:
Now you can install Fuel.
Now you can install Fuel.


==Install==
==Install Fuel from Source==


<pre>
<pre>
$ git clone git@github.com:/mila-udem/fuel.git
$ git clone git@github.com:/mila-udem/fuel.git
$ cd fuel
$ cd fuel
$ python setup.py build && python setup.py install
$ python setup.py build  
$ python setup.py install
</pre>
</pre>


==Wrapping Custom Datasets with Fuel==
=Basic Usage=
 
{{Main|Fuel/Usage}}
 
Summary:
* [[Fuel/Usage#Datasets|Datasets]] are the principal interface to data, but are abstract classes
* [[Fuel/Usage#IterableDataset Example|IterableDatasets]] (less powerful) allow sequential access to data in specified order only
* [[Fuel/Usage#IndexableDataset Example|IndexableDatasets]] (more powerful) allow random access to data
* [[Fuel/Usage#Iteration Schemes|Schemes]] allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)


Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel
=Wrapping Custom Datasets with Fuel=


<pre>
{{Main|Fuel/Custom Datasets}}
from keras.models import Sequential
 
from lfw_fuel import lfw
Basically, the process of wrapping a custom data set with fuel looks like this:
* Specify how the original data should be downloaded, processed, and turned into a fuel data set
* Specify how the fuel data set should be loaded
 
The first step - defining how to turn original data into fuel data:
* Create a download wrapper - this tells fuel how to download the original data ("briq" download?)
* Define a way to load a single piece of data (e.g., parameterized by name) and, optionally, paired/related pieces of data (e.g., two related images)
* Convert function to extract all data and assemble it all into an HDF5 file (and remove original data when finished)


# the data, shuffled and split between train and test sets
The second step - specifying how the fuel data set should be loaded:
(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")
* Create a fuel Datasets object (inheriting from, e.g., H5PYDataset)
* Define a way for that data to be loaded (example: make a universally-available load_data method in a package specific to your data set, as in lfw_fuel)


# (build the perfect model here)
=Flags=


model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
{{FuelFlag}}
score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)
</pre>

Latest revision as of 21:43, 15 October 2017

Basics

Fuel is a library for creating machine learning data pipelines. There are multiple features that make it really convenient.

Find fuel on Github here: https://github.com/mila-udem/fuel

Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html

Prerequisites

Fuel uses HDF5, so you will need a copy of HDF5 header files installed locally. Use your package manager, or follow HDF5 installation instructions. On a Mac:

$ brew install hdf5

Now you can install Fuel.

Install Fuel from Source

$ git clone git@github.com:/mila-udem/fuel.git
$ cd fuel
$ python setup.py build 
$ python setup.py install

Basic Usage

Summary:

  • Datasets are the principal interface to data, but are abstract classes
  • IterableDatasets (less powerful) allow sequential access to data in specified order only
  • IndexableDatasets (more powerful) allow random access to data
  • Schemes allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)

Wrapping Custom Datasets with Fuel

Basically, the process of wrapping a custom data set with fuel looks like this:

  • Specify how the original data should be downloaded, processed, and turned into a fuel data set
  • Specify how the fuel data set should be loaded

The first step - defining how to turn original data into fuel data:

  • Create a download wrapper - this tells fuel how to download the original data ("briq" download?)
  • Define a way to load a single piece of data (e.g., parameterized by name) and, optionally, paired/related pieces of data (e.g., two related images)
  • Convert function to extract all data and assemble it all into an HDF5 file (and remove original data when finished)

The second step - specifying how the fuel data set should be loaded:

  • Create a fuel Datasets object (inheriting from, e.g., H5PYDataset)
  • Define a way for that data to be loaded (example: make a universally-available load_data method in a package specific to your data set, as in lfw_fuel)

Flags