-
Notifications
You must be signed in to change notification settings - Fork 1
Triton agony aunt
Well the first thing to do is set up Triton. Thankfully, there already exists a great resource from the Aalto scientific computing guys.
One can either choose to work solely through the command line, or can seamlessly integrate with VSCode by SSH.
It is naturally best practice to use environments when dealing with as complex a beast as Triton. This goes for both python
and R
. In both cases, one must specify the package installation locations. A comprehensive tutorial can be found here. If using R
, packages can be listed as
# environment.yml
name: my_project
channels:
- conda-forge
dependencies:
- r-base
- r-brms
- r-tidyverse
- r-devtools
and then an environment created by running
module load miniconda
conda env create --file environment.yml
these environments are then activated by each individual shell script in turn by adding
module load miniconda
source activate my_project
All of the main work should be executed from one's /scratch/work/
directory since it has higher memory than the root folder. A nice template structure for an individual project might look like (with .conda_envs
and .conda_pkgs
initialised automatically by following this tutorial)
/scratch/work/
├── .conda_envs
├── .conda_pkgs
└── my_project
├── data
│ └── experiment_data.csv
├── img
│ └── experiment.pdf
├── R
│ └── experiment.R
├── out
│ └── experiment.out
├── shell
│ └── experiment.sh
└── environment.yml
wherein our experiment.R
file runs an experiment, perhaps something of the form
set.seed(300416)
# define experiment function
experiment <- function (mu) {
y <- rnorm(100, mu, 1)
res <- data.frame(y = y, mu = mu)
return (res)
}
# define different values
mus <- c(0, 1, 10)
# perform experiment
res <- parallel::mcMap(
f = experiment,
mu = mus,
# user the number of CPUs defined in shell
mc.cores = Sys.getenv('SLURM_CPUS_PER_TASK')
# concatenate experiment results
df <- do.call("rbind", res)
# write the table
setwd("/scratch/work/<my-user-name>/<project-root>")
csv_name <- paste0("./data/experiment_data.csv")
ff <- file(csv_name, open="w")
write.table(df, file = ff, sep = ",", row.names = FALSE)
close(ff)
This script is then called by the shell file
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=3
#SBATCH --mem=1G
#SBATCH --output=./out/experiment.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Load the version of R you want to use
module load miniconda
source activate my_project
# Run R scripts
srun Rscript ./R/experiment.R
Note in this above example that for a parallelised process we first concatenate the data from all parallel runs before writing to csv. This is helps avoid confusion in memory access.
To set up a research project utilizing Triton, you can use this cookiecutter template to initialize a directory with (roughly) the structure proposed. In nutshell, cookiecutter creates projects from templates (template = cookiecutter) and aims to reduce the amount of boilerplate code you need to write.
You can install cookiecutter with pip:
pip install cookiecutter
and then run
cookiecutter gh:LeeviLindgren/cookiecutter-R-triton
on the folder in which you wish to create the new project. Cookiecutter will ask you to fill in some details such as your name and the project name. Cookiecutter will create a new directory with the name you provided with a README.md
-file which should help get you started. For instance, it shows how you can run an example job in Triton (or any other cluster using Slurm.)
Given the multitude of installation approaches and optimisations available for Stan and its interfaces, we've put together a Docker image with the latest versions of cmdstanr
and rstan
installed and maximally optimised.
Triton's interface to Containers (e.g., Docker) is Singularity, which can also run Docker containers with no extra configuration. The image and its R packages are stored in the /scratch/cs/bayes_ave
folder, so you don't need to build or store the image in your own storage directory.
Using the Docker image requires minimal changes to an existing analysis script. As an example, take the following script which executes an R analysis without the image:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=3
#SBATCH --mem=1G
#SBATCH --output=./out/experiment.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Run R scripts
srun Rscript ./R/experiment.R
To run this analysis using the image, we simply replace the srun
command with a call to the Singularity container:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=3
#SBATCH --mem=1G
#SBATCH --output=./out/experiment.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Run R scripts
singularity run -B /scratch,/m,/l,/share /scratch/cs/bayes_ave/stan-triton.sif Rscript ./R/experiment.R
The command:
singularity run -B /scratch,/m,/l,/share /scratch/cs/bayes_ave/stan-triton.sif
Will execute the Rscript
command within the container. You can replace this Rscript ...
command with whichever command you want to run in the container. The -B /scratch,/m,/l,/share
option mounts Triton's filesystems within the image so that you can access your work directory, and so that cmdstanr
can access the CmdStan installation in /scratch/cs/bayes_ave
.
Because the compiler toolchain and system libraries will differ between the image and the Triton cluster itself, mixing R packages built under Triton with those built under the image can result in errors when calling them. Instead, make sure that you only install packages using the image, and do not mount your own package library. The image is configured to install packages within your /scratch
directory, but you need to first create the folder otherwise R will not recognise it.
To do so, run:
mkdir -p /scratch/work/${USER}/stan-triton/R/library
Then you can install and use R packages as usual
Pre-built packages for the latest version of rstan
are StanHeaders
are provided for easy use with the image. To install, run:
# Need to make sure dependencies are already available:
install.packages(c("RcppParallel","RcppEigen","inline","gridExtra","loo","V8","BH"))
# Install binaries
install.packages("/scratch/cs/bayes_ave/rstan/StanHeaders_2.32.1.9000_R_x86_64-pc-linux-gnu.tar.gz", repos = NULL)
install.packages("/scratch/cs/bayes_ave/rstan/rstan_2.32.1.9000_R_x86_64-pc-linux-gnu.tar.gz", repos = NULL)
To use OpenCL/GPU acceleration within the image, you need to both request a GPU partition in your analysis script and launch the image with GPU support. Using the same example script from above, this is done by adding #SBATCH --gres=gpu:1
to the configuration header, and adding --nv
to the singularity call:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=3
#SBATCH --mem=1G
#SBATCH --gres=gpu:1
#SBATCH --output=./out/experiment.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Run R scripts
singularity run --nv -B /scratch,/m,/l,/share /scratch/cs/bayes_ave/stan-triton.sif Rscript ./R/experiment.R
The --nv
option is critical here, otherwise Triton will not provide GPU access to the image.
For a quick tutorial on GPU-Acceleration in cmdstanr
see this article: https://mc-stan.org/cmdstanr/articles/opencl.html
It is sometimes interesting to access the job ID, array ID, or the number of cores requested by a Triton job in your R
script. Well, you're in luck, these three can be retrieved painlessly with
job_id <- Sys.getenv("SLURM_ARRAY_JOB_ID", 0)
array_id <- Sys.getenv("SLURM_ARRAY_TASK_ID", 0)
ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE", 2))
Once we have run the Triton job from the root project directory, it will automatically spit out all output into a slurm-%A_%a.out
file (%A
here being the environment variable identifying the job ID, and %a
the array ID) and dump this file into your root directory. So as not to clog up your root directory with such slurm*.out
files, create a specific directory in which you wish these files to be dumped, say slurm_output
, and add in the following to header of the bash script executing the job
#SBATCH --output=slurm_ouput/%A_%a.out