Latest revision as of 19:14, 23 September 2017

Notes on a Docker Pod for deep learning.

Setting Up Docker Deep Learning

We are looking for Docker images that can handle a couple of different deep learning technologies:

Python 3
Jupyter
Numpy, scipy, matplotlib, pandas
Scikit Learn/Scikit Image
Tensorflow
OpenCV
Keras

It would also be nice to be ready to use a GPU if it is available...

This may require a single Docker container, or it might require the use of multiple containers. Either way, we'll call it a Docker pod - a collection of related containers.

Docker container from dockerhub

To get various containers set up, we can use a container created by Github user @waleedka:

Github: https://github.com/waleedka/modern-deep-learning-docker
Dockerhub: https://hub.docker.com/r/waleedka/modern-deep-learning/

This Github repo provides a Dockerfile that installs pretty much every item we wanted from the list above, plus a few other things (Java).

There is also the @floydhub snake-oil salesman, who nonetheless has some interesting materials: https://github.com/floydhub/dl-docker

Setting Up Docker Container

Using CPU Based Platform

If you're just using a CPU, start by installing Docker on your platform of choice: Docker/Installing

Next, if you just want to use the deep learning container without any modifications, run this pair of commands to get the docker container and run it:

$ docker pull waleedka/modern-deep-learning
$ docker run -it -p 8888:8888 -v ~/:/host waleedka/modern-deep-learning

Note that this takes care of adding a persistent volume to the container, located at /host, that maps to the host's home directory. This allows getting data in and out of the container.

Using GPU Based Platform

Using a GPU is a little more complicated, since Docker containers have no inherent way of accessing GPU hardware from onboard the container.

Nvidia-docker provides a CUDA image and a docker command line wrapper to allow the GPUs to be accessed by a Docker container when it is launched. To get nvidia-docker, you have to sign up for a free account with Nvidia: https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/

Once you do that and install the nvidia-docker utility, you will have a command line utility for running it. Here's what running a hello world script looks like with nvidia-docker:

$ nvidia-docker run --rm hello-world

Here are the steps that Nvidia suggests for any nvidia-docker project:

1. Set up and explore the development environment inside a container.

2. Build the application in the container.

3. Deploy the container in multiple environments.

Once you've done all of that, you can run the container as above (with the CPU case), but replacing docker with nvidia-docker:

$ nvidia-docker run -it -p 8888:8888 -v ~/:/host waleedka/modern-deep-learning

Customizing Docker Container

Using the Docker image as-is just requires getting the image from Github or Docker hub.

To customize it, I created a git repository: https://git.charlesreid1.com/docker/d-deep-learning

Data Volumes Strategy

Volumes strategy for deep learning models using Docker:

Training and testing is going to happen in one go. Single data set, split into training or testing data. Training and testing both happen in same notebook.

Data from host to container includes:

Entire data set - split into training and testing for the two different steps.
Pre-prepared notebooks or scripts
Input files

Data from container to host includes:

Exported trained model
Files with results (e.g., image style transfer or generated text)

May be running multiple containers with multiple algorithms, architectures, or parameters

Aggregating output from multiple containers

For GPU/expensive instances, need to get the data off of the machine when finished instead of waiting

Optimal: have one or multiple shared disks with persistent storage

Alt: have secure file transfer happen when training is finished

Next step beyond this would be to put machine learning models into production.

Testing Docker Deep Learning

CPU-Based Platform (Macbook Pro)

Stock image

Start out by running Docker.app in the Applications folder. This will run the Docker daemon in the background.

Now get the docker container, run it, and start a notebook.

$ docker pull waleedka/modern-deep-learning

$ docker run -it -p 8888:8888 waleedka/modern-deep-learning

root@a944863bc1e6:~# 

root@a944863bc1e6:~# jupyter notebook

Now on the host machine, we can navigate to localhost:8888 and see a Jupyter notebook server up and running. This is exposing the container's file system and any notebooks running in the container. This container runs Python 3 only.

Create a new Python notebook, and try importing a few libraries:

import numpy
import scipy
import sklearn
import theano
import tensorflow
import pandas
import matplotlib
import keras

Custom image

Or, use the custom image:

$ git clone https://charlesreid1.com:3000/docker/d-deep-learning.git
$ cd d-deep-learning

Build it:

$ docker build -t deep_learning

Run it:

$ docker run -it -p 8888:8888 deep_learning

GPU-Based Platform

On a GPU-based platform, you can test out the deep learning image as follows.

First, make sure the NVIDIA CUDA driver for the GPU card is installed. cuDNN (CUDA toolbox for deep learning/neural networks) is included with the deep learning docker image provided by waleedka, so you don't need to install cuDNN.

Next, install Docker, followed by Nvidia-docker.

The deep learning image is run using the same command as above, but with nvidia-docker instead of regular old docker.

$ nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v ~/:/host waleedka/modern-deep-learning:gpu

Flags

@@ Line 1: / Line 1: @@
-For this, I used the docker image from docker hub, waleedka/modern-deep-learning.
+Notes on a Docker Pod for deep learning.
-=Basics=
+=Setting Up Docker Deep Learning=
-==Running the Deep Learning Container==
+We are looking for Docker images that can handle a couple of different deep learning technologies:
+* Python 3
+* Jupyter
+* Numpy, scipy, matplotlib, pandas
+* Scikit Learn/Scikit Image
+* Tensorflow
+* OpenCV
+* Keras
-Let's start with how we get this deep learning docker container up and running.
+It would also be nice to be ready to use a GPU if it is available...
-Start by installing Docker: [[Docker/Installing]]
+This may require a single Docker container, or it might require the use of multiple containers. Either way, we'll call it a Docker pod - a collection of related containers.
-Next, this deep learning container can run a Jupyter notebook server, which runs on port 8888 by default, so we'll pass the container's port 8888 through to the host machine's port 8888:
+==Docker container from dockerhub==
+To get various containers set up, we can use a container created by Github user [https://github.com/waleedka @waleedka]:
+* Github: https://github.com/waleedka/modern-deep-learning-docker
+* Dockerhub: https://hub.docker.com/r/waleedka/modern-deep-learning/
+This Github repo provides a Dockerfile that installs pretty much every item we wanted from the list above, plus a few other things (Java).
+There is also the @floydhub snake-oil salesman, who nonetheless has some interesting materials: https://github.com/floydhub/dl-docker
+==Setting Up Docker Container==
+===Using CPU Based Platform===
+If you're just using a CPU, start by installing Docker on your platform of choice: [[Docker/Installing]]
+Next, if you just want to use the deep learning container without any modifications, run this pair of commands to get the docker container and run it:
 <pre>
-$ docker run -it -p 8888:8888 waleedka/modern-deep-learning
+$ docker pull waleedka/modern-deep-learning
+$ docker run -it -p 8888:8888 -v ~/:/host waleedka/modern-deep-learning
+</pre>
+Note that this takes care of adding a persistent volume to the container, located at <code>/host</code>, that maps to the host's home directory. This allows getting data in and out of the container.
+===Using GPU Based Platform===
+Using a GPU is a little more complicated, since Docker containers have no inherent way of accessing GPU hardware from onboard the container.
+Nvidia-docker provides a CUDA image and a docker command line wrapper to allow the GPUs to be accessed by a Docker container when it is launched. To get nvidia-docker, you have to sign up for a free account with Nvidia: https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/
+Once you do that and install the nvidia-docker utility, you will have a command line utility for running it. Here's what running a hello world script looks like with nvidia-docker:
+<pre>
+$ nvidia-docker run --rm hello-world
 </pre>
-This is great, but unfortunately any changes we make or notebooks we create will disappear with our container, so we'll need to figure out data volumes.
+Here are the steps that Nvidia suggests for any nvidia-docker project:
-For the time being, let's start by testing out the container and making sure the software components work.
+. Set up and explore the development environment inside a container.
-Then we'll figure out a schema for data volumes, and how we get data into and out of our deep learning container.
+. Build the application in the container.
-==Testing it out==
+. Deploy the container in multiple environments.
-To take this for a test drive, run the above command. This will give you a bash terminal on the docker container, where we can run a Jupyter notebook:
+Once you've done all of that, you can run the container as above (with the CPU case), but replacing <code>docker</code> with <code>nvidia-docker</code>:
 <pre>
+$ nvidia-docker run -it -p 8888:8888 -v ~/:/host waleedka/modern-deep-learning
+</pre>
+==Customizing Docker Container==
+Using the Docker image as-is just requires getting the image from Github or Docker hub.
+To customize it, I created a git repository: https://git.charlesreid1.com/docker/d-deep-learning
+==Data Volumes Strategy==
+Volumes strategy for deep learning models using Docker:
+Training and testing is going to happen in one go. Single data set, split into training or testing data. Training and testing both happen in same notebook.
+Data from host to container includes:
+* Entire data set - split into training and testing for the two different steps.
+* Pre-prepared notebooks or scripts
+* Input files
+Data from container to host includes:
+* Exported trained model
+* Files with results (e.g., image style transfer or generated text)
+May be running multiple containers with multiple algorithms, architectures, or parameters
+Aggregating output from multiple containers
+For GPU/expensive instances, need to get the data off of the machine when finished instead of waiting
+Optimal: have one or multiple shared disks with persistent storage
+Alt: have secure file transfer happen when training is finished
+Next step beyond this would be to put machine learning models into production.
+=Testing Docker Deep Learning=
+==CPU-Based Platform (Macbook Pro)==
+===Stock image===
+Start out by running Docker.app in the Applications folder. This will run the Docker daemon in the background.
+Now get the docker container, run it, and start a notebook.
+<pre>
+$ docker pull waleedka/modern-deep-learning
 $ docker run -it -p 8888:8888 waleedka/modern-deep-learning
+root@a944863bc1e6:~#
 root@a944863bc1e6:~# jupyter notebook
 </pre>
@@ Line 47: / Line 137: @@
 </pre>
-=Data Volumes Strategy=
+===Custom image===
-Let's walk through a volumes strategy for deep learning models using Docker. The strategy we use depends on whether we're training deep learning models using data, or running deep learning models to make predictions.
+Or, use the custom image:
-==Training==
+<pre>
+$ git clone https://charlesreid1.com:3000/docker/d-deep-learning.git
+$ cd d-deep-learning
+</pre>
-If you are training deep learning models using docker containers, you want to be able to train one or more models on a given data set. That's the point of creating your data volume container - you can try different neural networks by spinning up different containers.
+Build it:
-Note that training data may come from a variety of sources, but here we'll treat the training data as some files on disk.
+<pre>
+$ docker build -t deep_learning
+</pre>
-===Workflow===
+Run it:
-Here are a few things we know about the workflow of training deep learning models in parallel:
+<pre>
-* Each container will be loading up the same data set for training; if containers load up different training sets, they should be getting a data volume from a different container.
+$ docker run -it -p 8888:8888 deep_learning
-* Each container will be creating a unique model and will need to dump this model somewhere. (These models may be unique because they focus on different chunks of data, or because they are creating models using different X's and Y's, or because they are trying different strategies or architectures or model parameters.)
+</pre>
-===Input===
+==GPU-Based Platform==
-The input is the training data.
+On a GPU-based platform, you can test out the deep learning image as follows.
-The training data volume should be a single volume mounted read-only from the data volume container.
+First, make sure the NVIDIA CUDA driver for the GPU card is installed. cuDNN (CUDA toolbox for deep learning/neural networks) is included with the deep learning docker image provided by waleedka, so you don't need to install cuDNN.
-===Output===
+Next, install Docker, followed by Nvidia-docker.
-The output is the neural network or resulting model, which can be handled a few ways.
+The deep learning image is run using the same command as above, but with nvidia-docker instead of regular old docker.
-Easiest way is to mount a host directory in the container, and dump completed models into that directory.
+<pre>
+$ nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v ~/:/host waleedka/modern-deep-learning:gpu
-Another method is to run a database, possibly another container, that will store the resulting models (in whatever format they are exported...???).
+</pre>
-Yet another possibility is to have a persistent drive in a data volume container, and each other container mounts and shares that single volume. This seems complicated and not efficient, though, so mounting a host directory is probably easiest.
-==Running/Predicting==
-Once you've used your training procedure to test out a whole bunch of configurations, you will decide on one or a few, and will now want to create a different workflow for putting those machines/models into production.
-===The Workflow===
-Here's what we know about the workflow:
-* The outcome of the training process is a big pile of model files. The outcome of the expert review process is a slightly smaller pile of model files.
-* We want to pass a model file or a set of model files to our Docker container, and have it load up these models and run them (assume some other software has decided how, exactly, that's going to work)
-* When running in prediction mode, docker container will load up the trained models (multiple instances can/will share models). Similar to above, different models go in different containers.
-* When running in prediction mode, docker container will need to accept data coming in (X) and send predictions out (Y). This may happen via files or an API, but each container will be seeing '''different''' data sets.
 =Flags=
 {{DockerFlag}}

Docker/Pods/Deep Learning: Difference between revisions

From charlesreid1