Basics

Fuel is a library for creating machine learning data pipelines. There are multiple features that make it really convenient.

Find fuel on Github here: https://github.com/mila-udem/fuel

Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html

Prerequisites

Fuel uses HDF5, so you will need a copy of HDF5 header files installed locally. Use your package manager, or follow HDF5 installation instructions. On a Mac:

$ brew install hdf5

Now you can install Fuel.

Install

$ git clone git@github.com:/mila-udem/fuel.git
$ cd fuel
$ python setup.py build && python setup.py install

Basic Usage

See Fuel/Usage

Summary:

Datasets are the principal interface to data, but are abstract classes
IterableDatasets allow sequential access to data in specified order only
IndexableDatasets allow random access to data
Schemes allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)

Wrapping Custom Datasets with Fuel

Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel

Advantages:

Only takes one command to download the data and import it into fuel
Then it only takes one command to import the library that wraps the data, and be able to turn it into training/testing X and Y

Disadvantages:

One-size-fits-all; importing data using load_data() can take a REALLY long time, and must be done every time you run the script (not persistent in memory)
Complicated to extend
Removes some of the nicer options of fuel

Here is what the final payoff looks like:

from keras.models import Sequential
from lfw_fuel import lfw

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled")

# (build the perfect model here)

model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)

Flags

Fuel

From charlesreid1

Contents