This repository contains a number of tutorials to get started running code on the ML Cloud SLURM cluster.
The idea is to get a number of "Hello World" type examples working in order to understand how all the different components fit together: slurm, ssh, file and data storage, python environments, and containers. This is not intented as a fully functional tools for getting productive. Consider it a starting point.
This example assumes that you are using Ubuntu Linux. Generally speaking, all of the below stuff is also possible with other Linux distirbutions, Macs and Windows (using Putty). The instructions below may partially work using those other operating systems, but it is likely that some commands or steps will require changes.
Useful things to familiarise yourself with before starting:
- SSH and SSH keys
- Containers in general and singularity containers
- The Slurm job scheduling system and its user guide on the ML Cloud
Apply for access to ML cloud here.
Once access is granted contact Benjamin Gläßle, or one of the other ML Cloud team members, to get access to Slurm.
Once Slurm access is granted as well switch to SSH-key based authentication as described here.
- Hello World example
- SSH configuration
- Getting your code and data on the slurm cluster
- Running python code on slurm
- Running jobs using containers - coming soon
- Interactive sessions: Jupyter notebook example
- Singularity tutorial with GPU use and PyTorch
- A python tool for deploying slurm jobs with singularity containers developed by the Sinz lab
- A list of all available Docker images with Cuda support to build the container from (if you are not happy with Ubunut 20.04)
- ML Cloud Slurm Wiki
- Singularity user guide