Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singularity for running deep learning codes on GPU #10

Open
guchchcug opened this issue Jul 23, 2019 · 17 comments
Open

Singularity for running deep learning codes on GPU #10

guchchcug opened this issue Jul 23, 2019 · 17 comments

Comments

@guchchcug
Copy link

I'd like to learn using Singularity to run deep learning codes with GPU on XSEDE, and with multi-CPUs on jet streams. I have at least MNIST codes ready for that testing.

@agladstein
Copy link

Hi @guchchcug, I've used multi-cpus on Jetstream, transferred data to Bridges, and run deep learning on gpu on Bridges in Singularity container. I could come during lunch or an evening to go over it.

do you have a GPU XSEDE resource to play on?

  1. I created a singularity image (tensorflow1.12.0-py3-cuda9.0-ubuntu16.04.simg) with the Singularity recipe:
Bootstrap: docker
From: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04

%post
#Updating and getting required packages
apt-get -y update
apt-get -y upgrade
apt-get install -y wget git vim python3 python3-pip python3-tk
ln -s /usr/bin/python3 /usr/bin/python

#Download and install Anaconda
CONDA_INSTALL_PATH="/usr/local/anaconda3-4.2.0"
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
chmod +x Anaconda3-4.2.0-Linux-x86_64.sh
./Anaconda3-4.2.0-Linux-x86_64.sh -b -p $CONDA_INSTALL_PATH

export LC_ALL=C

#Install Tensorflow
pip3 install tensorflow-gpu==1.12.0
pip3 install tensorflow-probability

#Install other python modules
pip3 install scipy
pip3 install matplotlib==2.2.4
pip3 install sklearn
pip3 install pandas
pip3 install Pillow
pip3 install livelossplot
pip3 install hyperas
pip3 install GPy
pip3 install GPyOpt
pip3 install blinker
pip3 install psutil
pip3 install spacy

#Install Keras
pip3 install keras

#Run command defined in command line
%runscript
exec "$@"
  1. Transferred Singularity image to Bridges.

  2. Start interactive GPU-AI job https://www.psc.edu/bridges/user-guide/running-jobs#ai

interact -p GPU-AI -A mc5phjp --gres=gpu:volta16:1

where mc5phjp is my project ID

  1. Start environment
module load cuda/9.0
module load singularity/3.0.0
singularity shell --nv -B $SCRATCH tensorflow1.9.0-py3-cuda9.0-ubuntu16.04.simg
  1. Test deep learning
python my_deep_learning_code.py

Or alternatively, use similar steps to submit a job with slurm.

@guchchcug
Copy link
Author

guchchcug commented Jul 24, 2019 via email

@guchchcug
Copy link
Author

guchchcug commented Jul 24, 2019 via email

@agladstein
Copy link

hmmm, I just double checked on Bridges and it worked for me. Did you load the singularity module? What version of singularity did you build the image with? Did you test the singularity image where ever you originally built it?
Also, I can just pass you my image...

@guchchcug
Copy link
Author

guchchcug commented Jul 24, 2019 via email

@agladstein
Copy link

Did you test your image in the environment you originally created it?

@guchchcug
Copy link
Author

guchchcug commented Jul 24, 2019 via email

@agladstein
Copy link

agladstein commented Jul 24, 2019 via email

@guchchcug
Copy link
Author

guchchcug commented Jul 24, 2019 via email

@guchchcug
Copy link
Author

guchchcug commented Jul 24, 2019 via email

@agladstein
Copy link

You can also try a singularity image from singularity hub.
I just tried out a random image tagged tensorflow on singularity hub (https://singularity-hub.org/search)

on the bridges login node,

module load singularity/3.0.0

pull a tensorflow singularity image,

singularity pull shub://belledon/tensorflow-keras

start an interactive job

interact -p GPU-AI -A mc5phjp --gres=gpu:volta16:1

load necessary modules

module load singularity/3.0.0
module load cuda/9.0

Enter singularity image

singularity shell --nv -B $SCRATCH tensorflow-keras_latest.sif

Test if keras imports

Singularity tensorflow-keras_latest.sif:~> python
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
Using TensorFlow backend.

I did not verify that it uses the gpu, but I suspect it will.

@guchchcug
Copy link
Author

guchchcug commented Jul 26, 2019 via email

@agladstein
Copy link

Did you try just importing tensorflow or keras from python in the container? Did that work?

@guchchcug
Copy link
Author

guchchcug commented Jul 26, 2019 via email

@agladstein
Copy link

ah okay! In that case, the container works for what I thought you wanted. Looks like it's not able to connect to the amazon s3 bucket to get the mnist data. I'm not sure about why that's not working - would need to look at your code. Probably @julianpistorius has an idea.

@julianpistorius
Copy link
Member

Figured it out. Looks like they block outbound network access from those nodes. Pre-downloading the mnist.npz into the ~/.keras/datasets directory worked.

@guchchcug
Copy link
Author

guchchcug commented Jul 26, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants