Building CUDA-enabled Images for DataHub/DSMLP

In this branch we will cover the starting steps of creating a GPU accelerated Docker Image for DSMLP. It's recommended to follow the steps in the "master" branch before continuing.

DSMLP: Available GPUs

As of Fall 2020, there are 15 GPU nodes on the DSMLP cluster available for classroom use, and each node has 8 NVIDIA GPUs installed. These GPUs are dynamically assigned to a container on start-up when requested and will stay attached until that container is deleted, meaning a GPU will remain occupied even if it's actually not running anything.

The graphics driver will be installed automatically to the container on start-up. The current driver version is 418.88. Because of this, the latest CUDA Tookit that is supported on DSMLP is version 10.1, according to NVIDIA.

GPU Model	VRAM	Amount	Node
NVIDIA 1080 Ti	11GB	80	n01 through n12 except n09, n10
NVIDIA 2080 Ti	11GB	32	n18, n21, n22, n24
NVIDIA 1070 Ti	8GB	7	n10

Dockerfile: Choosing a Base Image

It's advised to use the ETS provided image ucsdets/datahub-base-notebook for a datahub-like experience. However, you can use any image from the DockerHub community (or even other public Container Registries). We will use the jupyter/scipy-notebook image from Jupyter Docker Stacks. For ucsdets/datahub-base-notebook, ETS uses jupyter/datascience-notebook as the base image and installs addtional software. jupyter/scipy-notebook, being the base image or jupyter/datascience-notebook, is smaller but has less functionality / fewer libraries.

Dockerfile: CUDA Tookit

Choosing the right version of CUDA is important because some legacy codebase relies on specific old versions of CUDA and their compatible software in order to run. We will use CUDA 10.1 for this example.

The following command let conda install CUDA Toolkit 10.1 along with a deep learning accelerator (cuDNN) and a device communication library (NCCL).

RUN conda install -y cudatoolkit=10.1 cudnn nccl

In the example Dockerfile, the above command is followed by conda clean --all -f -y, which cleans up the unnecessary cache. The two commands are executed sequentially with && in between in order to reduce overall size in that layer.

Dockerfile: Deep Learning Libraries

TensorFlow

There are two major versions of tensorflow APIs and they cannot coexist in the same environment. Look into the Dockerfile for the commands. Using tensorflow will get the latest 2.* version.

PyTorch

Installing PyTorch will require you to go on their website, select the appropriate specifications for the system and paste in the command. Remember to add --no-cache-dir after pip install to reduce image size.

Dockerfile: Additional Kernels

To install a new kernel that can be selected within a jupyter notebook, you can look into creating a second conda environment and use nb_conda_kernels to add it in.

Resources/Further Reading

DSMLP Knowledge Base Articles
CUDA Compatibility Table
cuDNN Support Matrix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Building CUDA-enabled Images for DataHub/DSMLP

DSMLP: Available GPUs

Dockerfile: Choosing a Base Image

Dockerfile: CUDA Tookit

Dockerfile: Deep Learning Libraries

TensorFlow

PyTorch

Dockerfile: Additional Kernels

Resources/Further Reading

Files

README.md

Latest commit

History

README.md

File metadata and controls

Building CUDA-enabled Images for DataHub/DSMLP

DSMLP: Available GPUs

Dockerfile: Choosing a Base Image

Dockerfile: CUDA Tookit

Dockerfile: Deep Learning Libraries

TensorFlow

PyTorch

Dockerfile: Additional Kernels

Resources/Further Reading