Fuel
From charlesreid1
Basics
Fuel is a library for creating machine learning data pipelines. There are multiple features that make it really convenient.
Find fuel on Github here: https://github.com/mila-udem/fuel
Overview of how it works: https://fuel.readthedocs.io/en/latest/overview.html
Prerequisites
Fuel uses HDF5, so you will need a copy of HDF5 header files installed locally. Use your package manager, or follow HDF5 installation instructions. On a Mac:
$ brew install hdf5
Now you can install Fuel.
Install
$ git clone git@github.com:/mila-udem/fuel.git $ cd fuel $ python setup.py build && python setup.py install
Basic Usage
See Fuel/Usage
Summary:
- Datasets are the principal interface to data, but are abstract classes
- IterableDatasets allow sequential access to data in specified order only
- IndexableDatasets allow random access to data
- Schemes allow iterating through IndexablelDatasets in various orders (batch, sequential, shuffle, etc.)
Wrapping Custom Datasets with Fuel
Repo by github user dribnet illustrates how to wrap a new dataset using Fuel: https://github.com/dribnet/lfw_fuel
Advantages:
- Only takes one command to download the data and import it into fuel
- Then it only takes one command to import the library that wraps the data, and be able to turn it into training/testing X and Y
Disadvantages:
- One-size-fits-all; importing data using load_data() can take a REALLY long time, and must be done every time you run the script (not persistent in memory)
- Complicated to extend
- Removes some of the nicer options of fuel
Here is what the final payoff looks like:
from keras.models import Sequential from lfw_fuel import lfw # the data, shuffled and split between train and test sets (X_train, y_train), (X_test, y_test) = lfw.load_data(format="deepfunneled") # (build the perfect model here) model.fit(X_train, Y_train, show_accuracy=True, validation_data=(X_test, Y_test)) score = model.evaluate(X_test, Y_test, show_accuracy=True, verbose=0)