layout	title	tagline
page	Run Containers on GPUs

Utilizing GPUs on the TACC supercomputer through containerized environments.

#### Choosing the Right System

You can register your app to ANY system at TACC, depending on your needs.

System	Cores/Node	Pros	Limitations
Stampede 2 Phase1	68	Thousands of nodes, KNL processors	Slow for serial code
Stampede 2 Phase2	48	Thousands of nodes, Skylake processors	Coming Soon, High Demand
Lonestar 5	24	Compute, GPUs, Large-mem	UT only, slow external network
Wrangler	24	SSD Filesystem for fast I/O, Hosted Databases, Hadoop, HDFS	Low node-count
Jetstream	24	Long running instances, root access	Limited storage
Chameleon	Variable	GPUs, bare metal VM, software defined networking	Difficult to configure
Catapult	16	FPGAs	Windows-only

You can learn about all choices at the TACC Systems Overview. Detailed specifications can be found in the User Guide of each system.

If you have an application already configured on a non-tacc system, you can register that system to the SD2E agave tenant.

System Registration Guide

After registration, you can not only run applications, but access data as well. Just remember that applications will run as YOUR user when you share them with others.

#### Containers @ TACC

TACC supports containerized compute environments through Singularity, which provides environment encapsulation without privilege escalation (root). Singularity provides the following functionality:

Environment encapsulation
Image based containers (single file)
Devices and interconnects are passed into container
- Infiniband (host and container MPI libs much be compatible)
- GPGPUs (host and container CUDA libs must match)
No abnormal privilege escalation allowed
No root daemons
Containers are read-only when not root
Pass in filesystems and directories your user has access to

Since version 2.3, Singularity has supported the two following workflows

##### Local Container Development

Create a Singularity container from scratch.

Create image of specific size
(sudo) bootstrap image
- (sudo) add content through definition file
- (sudo) manually install software
Done

http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image

##### Docker Import

Utilize your knowledge of Docker to create Singularity images.

Pull docker image
Run docker image

http://singularity.lbl.gov/archive/docs/v2-3/docs-docker

To help with this area of development, TACC has a set of base containers for each system

Stampede2 - https://hub.docker.com/r/gzynda/tacc-stampede2-mpi/

with libraries for the specialize hardware, and a simple base image that will work with Singularity on all TACC system.

All of TACC - https://hub.docker.com/r/gzynda/tacc-base/

This means you can re-use your Docker knowledge for Singularity containers at TACC.

##### Running the container

These containers are run without root, so you simply

run - Run the default functionality of the container, which takes in arguments
exec - Execute a specific command inside the container, and then exit
shell - Enter the container and interactively run commands

#### GPU containers

Since Singularity supported docker containers, it has been fairly simple to utilize GPUs for machine learning code like TensorFlow. From TACC's GPU system:

# Work from a compute node
idev -m 60
# Load the singularity module
module load tacc-singularity
# Pull your image
singularity pull docker://nvidia/caffe:latest
#
singularity exec --nv caffe-latest.img caffe device_query -gpu 0

Please note that the --nv flag specifically passes the GPU drivers into the container. If you leave it out, the GPU will not be detected.

singularity exec caffe-latest.img caffe device_query -gpu 0

For TensorFlow, you can no longer directly use the images directly from their Docker repository since they now build them against cuda 9. I have created the tacc-maverick-ml, which includes

tensorflow v1.6.1
keras
pytorch
anaconda python 3

for this purpose.

# Change to your $WORK directory
cd $WORK

#Get the software
git clone https://github.com/tensorflow/models.git ~/models

# Pull the image
singularity pull docker://gzynda/tacc-maverick-ml:latest

# Is the GPU detected?
singularity exec --nv tacc-maverick-ml-latest.img python -c "import tensorflow as tf; tf.test.is_gpu_available()"

# Run the code
singularity exec --nv tacc-maverick-ml-latest.img python $HOME/models/tutorials/image/mnist/convolutional.py

You probably noticed that we check out the models repository into your $HOME directory. This is because your $HOME and $WORK directories are only available inside the container if the root folders /home and /work exist inside the container. In the case of tensorflow-latest-gpu.img, the /work directory does not exist, so any files there are inaccessible to the container.

You may be thinking "what about overlayfs??". The Linux kernel on Maverick2 does not support overlayfs, so it had to be disabled in our singularity install.

#### Prototyping

You can also prototype your application from inside your container with the shell command.

singularity shell --nv tacc-maverick-ml-latest.img
python --version
exit
python --version

#### Build your APP

You can then use these methods in your next Agave app.

Return to the API Documentation Overview

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05.container_gpu.md

05.container_gpu.md

Files

05.container_gpu.md

Latest commit

History

05.container_gpu.md

File metadata and controls