layout | title | tagline |
---|---|---|
page |
Run Containers on GPUs |
Utilizing GPUs on the TACC supercomputer through containerized environments.
#### Choosing the Right System
You can register your app to ANY system at TACC, depending on your needs.
System | Cores/Node | Pros | Limitations |
---|---|---|---|
Stampede 2 Phase1 | 68 | Thousands of nodes, KNL processors | Slow for serial code |
Stampede 2 Phase2 | 48 | Thousands of nodes, Skylake processors | Coming Soon, High Demand |
Lonestar 5 | 24 | Compute, GPUs, Large-mem | UT only, slow external network |
Wrangler | 24 | SSD Filesystem for fast I/O, Hosted Databases, Hadoop, HDFS | Low node-count |
Jetstream | 24 | Long running instances, root access | Limited storage |
Chameleon | Variable | GPUs, bare metal VM, software defined networking | Difficult to configure |
Catapult | 16 | FPGAs | Windows-only |
You can learn about all choices at the TACC Systems Overview. Detailed specifications can be found in the User Guide of each system.
If you have an application already configured on a non-tacc system, you can register that system to the SD2E agave tenant.
After registration, you can not only run applications, but access data as well. Just remember that applications will run as YOUR user when you share them with others.
#### Containers @ TACC
TACC supports containerized compute environments through Singularity, which provides environment encapsulation without privilege escalation (root). Singularity provides the following functionality:
- Environment encapsulation
- Image based containers (single file)
- Devices and interconnects are passed into container
- Infiniband (host and container MPI libs much be compatible)
- GPGPUs (host and container CUDA libs must match)
- No abnormal privilege escalation allowed
- No root daemons
- Containers are read-only when not root
- Pass in filesystems and directories your user has access to
Since version 2.3, Singularity has supported the two following workflows
##### Local Container Development
Create a Singularity container from scratch.
- Create image of specific size
- (sudo) bootstrap image
- (sudo) add content through definition file
- (sudo) manually install software
- Done
http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image
##### Docker Import
Utilize your knowledge of Docker to create Singularity images.
- Pull docker image
- Run docker image
http://singularity.lbl.gov/archive/docs/v2-3/docs-docker
To help with this area of development, TACC has a set of base containers for each system
with libraries for the specialize hardware, and a simple base image that will work with Singularity on all TACC system.
- All of TACC - https://hub.docker.com/r/gzynda/tacc-base/
This means you can re-use your Docker knowledge for Singularity containers at TACC.
##### Running the container
These containers are run without root, so you simply
- run - Run the default functionality of the container, which takes in arguments
- exec - Execute a specific command inside the container, and then exit
- shell - Enter the container and interactively run commands
#### GPU containers
Since Singularity supported docker containers, it has been fairly simple to utilize GPUs for machine learning code like TensorFlow. From TACC's GPU system:
# Work from a compute node
idev -m 60
# Load the singularity module
module load tacc-singularity
# Pull your image
singularity pull docker://nvidia/caffe:latest
#
singularity exec --nv caffe-latest.img caffe device_query -gpu 0
Please note that the --nv
flag specifically passes the GPU drivers into the container. If you leave it out, the GPU will not be detected.
singularity exec caffe-latest.img caffe device_query -gpu 0
For TensorFlow, you can no longer directly use the images directly from their Docker repository since they now build them against cuda 9. I have created the tacc-maverick-ml, which includes
- tensorflow v1.6.1
- keras
- pytorch
- anaconda python 3
for this purpose.
# Change to your $WORK directory
cd $WORK
#Get the software
git clone https://github.com/tensorflow/models.git ~/models
# Pull the image
singularity pull docker://gzynda/tacc-maverick-ml:latest
# Is the GPU detected?
singularity exec --nv tacc-maverick-ml-latest.img python -c "import tensorflow as tf; tf.test.is_gpu_available()"
# Run the code
singularity exec --nv tacc-maverick-ml-latest.img python $HOME/models/tutorials/image/mnist/convolutional.py
You probably noticed that we check out the models repository into your $HOME
directory. This is because your $HOME
and $WORK
directories are only available inside the container if the root folders /home
and /work
exist inside the container. In the case of tensorflow-latest-gpu.img
, the /work
directory does not exist, so any files there are inaccessible to the container.
You may be thinking "what about overlayfs??". The Linux kernel on Maverick2 does not support overlayfs, so it had to be disabled in our singularity install.
#### Prototyping
You can also prototype your application from inside your container with the shell
command.
singularity shell --nv tacc-maverick-ml-latest.img
python --version
exit
python --version
#### Build your APP
You can then use these methods in your next Agave app.
Return to the API Documentation Overview