Skip to content

Instructions and examples to deploy some PyTorch code on slurm using a Singularity Container

Notifications You must be signed in to change notification settings

lmkoch/tue-slurm-helloworld

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 

Repository files navigation

Instructions

This repository contains a number of tutorials to get started running code on the ML Cloud SLURM cluster.

The idea is to get a number of "Hello World" type examples working in order to understand how all the different components fit together: slurm, ssh, file and data storage, python environments, and containers. This is not intented as a fully functional tools for getting productive. Consider it a starting point.

Prerequisites

This example assumes that you are using Ubuntu Linux. Generally speaking, all of the below stuff is also possible with other Linux distirbutions, Macs and Windows (using Putty). The instructions below may partially work using those other operating systems, but it is likely that some commands or steps will require changes.

Useful things to familiarise yourself with before starting:

Get access to Slurm

Apply for access to ML cloud here.

Once access is granted contact Benjamin Gläßle, or one of the other ML Cloud team members, to get access to Slurm.

Once Slurm access is granted as well switch to SSH-key based authentication as described here.

Next steps

  1. Hello World example
  2. SSH configuration
  3. Getting your code and data on the slurm cluster
  4. Running python code on slurm
  5. Running jobs using containers - coming soon
  6. Interactive sessions: Jupyter notebook example

Useful links for context

About

Instructions and examples to deploy some PyTorch code on slurm using a Singularity Container

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published