From charlesreid1

Line 53: Line 53:
==Training==
==Training==


For training deep learning models in parallel:
If you are training deep learning models using docker containers, you want to be able to train one or more models on a given data set. That's the point of creating your data volume container - you can try different neural networks by spinning up different containers.
 
Note that training data may come from a variety of sources, but here we'll treat the training data as some files on disk.
 
===Workflow===
 
Here are a few things we know about the workflow of training deep learning models in parallel:
* Each container will be loading up the same data set for training; if containers load up different training sets, they should be getting a data volume from a different container.
* Each container will be loading up the same data set for training; if containers load up different training sets, they should be getting a data volume from a different container.
* Training data may be on an external volume or may come from a database (contained in yet another container). We'll just consider the case of data on disk.
* Each container will be creating a unique model and will need to dump this model somewhere. (These models may be unique because they focus on different chunks of data, or because they are creating models using different X's and Y's, or because they are trying different strategies or architectures or model parameters.)
* Each container will be creating a unique model and will need to dump this model somewhere. (These models may be unique because they focus on different chunks of data, or because they are creating models using different X's and Y's, or because they are trying different strategies or architectures or model parameters.)


The training data volume should be a single (possibly read-only) volume coming from a data volume container.
===Input===
 
The input is the training data.
 
The training data volume should be a single volume mounted read-only from the data volume container.
 
===Output===


The neural network or resulting model should be dumped to the host machine if possible, by mounting a host machine volume inside the docker container. Otherwise, it should be saved to a results docker container.
The output is the neural network or resulting model, which can be handled a few ways.


(Training data container)
Easiest way is to mount a host directory in the container, and dump completed models into that directory.


(Neural network models container)
Another method is to run a database, possibly another container, that will store the resulting models (in whatever format they are exported...???).


Alternatively, you could have a database that stores neural network model results, and each container's script says, "When you're finished, send off your neural network to the neural network results coordinator, and then die."
Yet another possibility is to have a persistent drive in a data volume container, and each other container mounts and shares that single volume. This seems complicated and not efficient, though, so mounting a host directory is probably easiest.


=Flags=
=Flags=


{{DockerFlag}}
{{DockerFlag}}

Revision as of 04:49, 25 March 2017

For this, I used the docker image from docker hub, waleedka/modern-deep-learning.

Basics

Running the Deep Learning Container

Let's start with how we get this deep learning docker container up and running.

Start by installing Docker: Docker/Installing

Next, this deep learning container can run a Jupyter notebook server, which runs on port 8888 by default, so we'll pass the container's port 8888 through to the host machine's port 8888:

$ docker run -it -p 8888:8888 waleedka/modern-deep-learning

This is great, but unfortunately any changes we make or notebooks we create will disappear with our container, so we'll need to figure out data volumes.

For the time being, let's start by testing out the container and making sure the software components work.

Then we'll figure out a schema for data volumes, and how we get data into and out of our deep learning container.

Testing it out

To take this for a test drive, run the above command. This will give you a bash terminal on the docker container, where we can run a Jupyter notebook:

$ docker run -it -p 8888:8888 waleedka/modern-deep-learning
root@a944863bc1e6:~# jupyter notebook

Now on the host machine, we can navigate to localhost:8888 and see a Jupyter notebook server up and running. This is exposing the container's file system and any notebooks running in the container. This container runs Python 3 only.

Create a new Python notebook, and try importing a few libraries:

DockerDeepLearningTest.png

import numpy
import scipy
import sklearn
import theano
import tensorflow
import pandas
import matplotlib
import keras

Data Volumes Strategy

Let's walk through a volumes strategy for deep learning models using Docker. The strategy we use depends on whether we're training deep learning models using data, or running deep learning models to make predictions.

Training

If you are training deep learning models using docker containers, you want to be able to train one or more models on a given data set. That's the point of creating your data volume container - you can try different neural networks by spinning up different containers.

Note that training data may come from a variety of sources, but here we'll treat the training data as some files on disk.

Workflow

Here are a few things we know about the workflow of training deep learning models in parallel:

  • Each container will be loading up the same data set for training; if containers load up different training sets, they should be getting a data volume from a different container.
  • Each container will be creating a unique model and will need to dump this model somewhere. (These models may be unique because they focus on different chunks of data, or because they are creating models using different X's and Y's, or because they are trying different strategies or architectures or model parameters.)

Input

The input is the training data.

The training data volume should be a single volume mounted read-only from the data volume container.

Output

The output is the neural network or resulting model, which can be handled a few ways.

Easiest way is to mount a host directory in the container, and dump completed models into that directory.

Another method is to run a database, possibly another container, that will store the resulting models (in whatever format they are exported...???).

Yet another possibility is to have a persistent drive in a data volume container, and each other container mounts and shares that single volume. This seems complicated and not efficient, though, so mounting a host directory is probably easiest.

Flags