Revision as of 22:22, 29 April 2017

Notes on a Docker Pod for deep learning.

Setting Up Docker Deep Learning

We are looking for Docker images that can handle a couple of different deep learning technologies:

Python 3
Jupyter
Numpy, scipy, matplotlib, pandas
Scikit Learn/Scikit Image
Tensorflow
OpenCV
Keras

It would also be nice to be ready to use a GPU if it is available...

This may require a single Docker container, or it might require the use of multiple containers. Either way, we'll call it a Docker pod - a collection of related containers.

Docker container from dockerhub

To get various containers set up, we can use a container created by Github user waleedka:

Github: https://github.com/waleedka/modern-deep-learning-docker
Dockerhub: https://hub.docker.com/r/waleedka/modern-deep-learning/

This Github repo provides a Dockerfile that installs pretty much every item we wanted from the list above, plus a few other things (Java).

There is also the floydhub snake-oil salesman, who nonetheless has some interesting materials: https://github.com/floydhub/dl-docker

Setting Up Docker Container

Using CPU Based Platform

If you're just using a CPU, start by installing Docker on your platform of choice: Docker/Installing

Next, if you just want to use the deep learning container without any modifications, run this command to get the docker container:

docker pull waleedka/modern-deep-learning

Now you can run the container using the following command:

docker run -it -p 8888:8888 -v ~/:/host waleedka/modern-deep-learning

Note that this takes care of adding a persistent volume to the container, located at /host, that maps to the host's home directory. This allows getting data in and out of the container.

Using GPU Based Platform

Using a GPU is a little more complicated, since Docker containers have no inherent way of accessing GPU hardware from onboard the container.

Nvidia-docker provides a CUDA image and a docker command line wrapper to allow the GPUs to be accessed by a Docker container when it is launched. To get nvidia-docker, you have to sign up for a free account with Nvidia: https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/

Once you do that and install the nvidia-docker utility, you will have a command line utility for running it. Here's what running a hello world script looks like with nvidia-docker:

nvidia-docker run --rm hello-world

Here are the steps that Nvidia suggests for any nvidia-docker project:

1. Set up and explore the development environment inside a container.

2. Build the application in the container.

3. Deploy the container in multiple environments.

Once you've done all of that, you can run the container as above (with the CPU case), but replacing docker with nvidia-docker:

docker run -it -p 8888:8888 -v ~/:/host waleedka/modern-deep-learning

Customizing Docker Container

If you want to use the docker image as-is, you can just grab the Dockerfile from Dockerhub, build it, and go. However, if you are interested in modifying the image, you'll want to grab the Dockerfile directly from Github. Here is the link again.

https://github.com/waleedka/modern-deep-learning-docker

Training Workflow vs Prediction Workflow

The training workflow consists of providing a large number of containers the same training set, training many different models on the data set, and outputting the results of each model to a host directory or database. Containers take training data in, and dump machine models out.

Outside of the Docker container workflow, there is a separate step in which each model's performance is evaluated, using whatever criteria is most appropriate. Maybe it is a single metric, maybe is a weighted average of multiple metrics. Whatever it is, the giant pile of models that was generated in the training workflow is whittled down to one or a few models that are useful.

The prediction workflow is used once that final model has been picked. In the prediction workflow, each container loads up the same model, and applies it to different data sets. It is the opposite of the training workflow. Containers take a machine model in, and dump data (predictions) out.

Data Volumes Strategy

Let's walk through a volumes strategy for deep learning models using Docker. The strategy we use depends on whether we're training deep learning models using data, or running deep learning models to make predictions.

Training workflow: Data going into container

The data that needs to get from the host into the container includes training data, pre-prepared notebooks or scripts, and input files. This data can be shared across multiple Docker containers, and will probably not need to be modified by the container. This data should be mounted read-only.

Training workflow: Data going out of container

The data that needs to get from the container out to the host includes the output data, the resulting neural network, files with results, and the final exported model. We may be running multiple containers to try different algorithms, architectures, or parameters, so we need to be able to aggregate output from multiple containers.

While the most convenient way of doing this is to mount a host volume and have the container dump out files, it is also possible to create a Docker container to run a database, and have each container connected to the database container and dumping files there. For a large number of containers, complex workflows, or complicated parameter space explorations, this is optimal - each container deals with a standardized interface.

Prediction workflow: Data going into container

For the prediction workflow, the data going into a container will be a single model or a small set of models (result of the training workflow; chosen from among many possible candidates). The data is loaded into the modeling framework. Loading and applying a model is much cheaper than training a model.

There will also be data that is unique to each container - presumably the prediction workflow will be applying the machine model to a large number of inputs. In this case, each container will need to ingest a stream of data and apply the model to it.

Prediction workflow: Data going out of container

The data leaving a container in the prediction workflow consists of the processed data. That can vary wildly, depending on the workflow and the data sets. It may be turning one set of images into another set, or extracting features from video frames, or even a simple YES/NO or POSITIVE/NEGATIVE categorization.

Testing Docker Deep Learning

CPU-Based Platform (Macbook Pro)

Stock image

Start out by running Docker.app in the Applications folder. This will run the Docker daemon in the background.

Now run docker pull to get the stock deep learning Docker container:

$ docker pull waleedka/modern-deep-learning

Now take it for a test drive. Start the container:

$ docker run -it -p 8888:8888 waleedka/modern-deep-learning
root@a944863bc1e6:~#

Now you can start the Jupyter notebook, and access it from the host at port 8888:

root@a944863bc1e6:~# jupyter notebook

Now on the host machine, we can navigate to localhost:8888 and see a Jupyter notebook server up and running. This is exposing the container's file system and any notebooks running in the container. This container runs Python 3 only.

Create a new Python notebook, and try importing a few libraries:

import numpy
import scipy
import sklearn
import theano
import tensorflow
import pandas
import matplotlib
import keras

GPU-Based Platform

On a GPU-based platform, you can test out the deep learning image as follows.

First, make sure the NVIDIA CUDA driver for the GPU card is installed. cuDNN (CUDA toolbox for deep learning/neural networks) is included with the deep learning docker image provided by waleedka, so you don't need to install cuDNN.

Next, install Docker, followed by Nvidia-docker.

The deep learning image is run using the same command as above, but with nvidia-docker instead of regular old docker.

nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v ~/:/host waleedka/modern-deep-learning:gpu

Flags

Docker/Pods/Deep Learning: Difference between revisions

From charlesreid1