From charlesreid1

Revision as of 22:59, 14 October 2017 by Admin (talk | contribs) (→‎Flags)

Basics

Fuel is a library for creating machine learning data pipelines. There are multiple features that make it really convenient.

Find fuel on Github here: https://github.com/mila-udem/fuel

Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html

Prerequisites

Fuel uses HDF5, so you will need a copy of HDF5 header files installed locally. Use your package manager, or follow HDF5 installation instructions. On a Mac:

$ brew install hdf5

Now you can install Fuel.

Install Fuel from Source

$ git clone git@github.com:/mila-udem/fuel.git
$ cd fuel
$ python setup.py build 
$ python setup.py install

Basic Usage

See Fuel/Usage

Summary:

  • Datasets are the principal interface to data, but are abstract classes
  • IterableDatasets allow sequential access to data in specified order only
  • IndexableDatasets allow random access to data
  • Schemes allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)

Wrapping Custom Datasets with Fuel

Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel

Advantages:

  • Only takes one command to download the data and import it into fuel
  • Then it only takes one command to import the library that wraps the data, and be able to turn it into training/testing X and Y

Disadvantages:

  • One-size-fits-all; importing data using load_data() can take a REALLY long time, and must be done every time you run the script (not persistent in memory)
  • Complicated to extend
  • Removes some of the nicer options of fuel

Here is what the final payoff looks like:

from keras.models import Sequential
from lfw_fuel import lfw

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")

# (build the perfect model here)

model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)

Flags