Docker/Jupyter PySpark: Difference between revisions
From charlesreid1
(Created page with "=Install= To run PySpark in a Jypyter notebook using Docker, we use a Docker image curated by the Jupyter project: jupyter/docker-stacks. Link to jupyter/pyspark-notebook on...") |
|||
| Line 25: | Line 25: | ||
</pre> | </pre> | ||
== | ==Fire it up== | ||
Fire up the Docker container with the command above: | Fire up the Docker container with the command above: | ||
| Line 41: | Line 41: | ||
--NotebookApp.keyfile=/etc/ssl/notebook/notebook.key | --NotebookApp.keyfile=/etc/ssl/notebook/notebook.key | ||
--NotebookApp.certfile=/etc/ssl/notebook/notebook.crt | --NotebookApp.certfile=/etc/ssl/notebook/notebook.crt | ||
</pre> | |||
==Test it out== | |||
Once you have your notebook open, execute the following Python code to ensure it works ok: | |||
<pre> | |||
import pyspark | |||
sc = pyspark.SparkContext('local[*]') | |||
# do something to prove it works | |||
rdd = sc.parallelize(range(1000)) | |||
rdd.takeSample(False, 5) | |||
</pre> | </pre> | ||
Revision as of 02:09, 27 September 2017
Install
To run PySpark in a Jypyter notebook using Docker, we use a Docker image curated by the Jupyter project: jupyter/docker-stacks.
Link to jupyter/pyspark-notebook on Dockerhub: https://hub.docker.com/r/jupyter/pyspark-notebook/
Link to jupyter/pyspark-notebook on Github: https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook
Link to jupyter/docker-stacks on Github: https://github.com/jupyter/docker-stacks
Get the docker container
The short version: get the docker image using docker pull:
$ docker pull jupyter/pyspark-notebook
That's it. There is no long version.
To run it, we need to pass traffic from port 8888 on our machine into port 8888 on the Docker image:
$ docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook
Fire it up
Fire up the Docker container with the command above:
$ docker run -it --rm -p 8888:8888 jupyter/pyspark-notebook
This will print out the URL for the Jupyter notebook. There are also ways to pass in a custom certificate, if you want to allow others to access the Jupyter notebook. These are all detailed in the jupyter/pyspark-notebook README, under the section "Notebook Options": https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook#notebook-options
docker run -d -p 8888:8888 \
-v /some/host/folder:/etc/ssl/notebook \
jupyter/pyspark-notebook start-notebook.sh \
--NotebookApp.keyfile=/etc/ssl/notebook/notebook.key
--NotebookApp.certfile=/etc/ssl/notebook/notebook.crt
Test it out
Once you have your notebook open, execute the following Python code to ensure it works ok:
import pyspark
sc = pyspark.SparkContext('local[*]')
# do something to prove it works
rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5)
Flags
| docker notes on the virtual microservice container platform
Installing the docker platform: Docker/Installing Docker Hello World: Docker/Hello World
Creating Docker Containers: Getting docker containers from docker hub: Docker/Dockerhub Creating docker containers with dockerfiles: Docker/Dockerfiles Managing Dockerfiles using git: Docker/Dockerfiles/Git Setting up Python virtualenv in container: Docker/Virtualenv
Running docker containers: Docker/Basics Dealing with volumes in Docker images: Docker/Volumes Removing Docker images: Docker/Removing Images Rsync Docker Container: Docker/Rsync
Networking with Docker Containers:
|
| docker pods pods are groups of docker containers that travel together
Docker pods are collections of Docker containers that are intended to run in concert for various applications.
Wireless Sensor Data Acquisition Pod The wireless sensor data acquisition pod deploys containers This pod uses the following technologies: Stunnel · Rsync · Apache · MongoDB · Python · Jupyter (numerical Python stack)
Deep Learning Pod This pod utilizes the following technologies: Python · Sklearn · Jupyter (numerical Python stack) · Keras · TensorFlow
|