Skip to content

Slurm Configuration

Lukas Mueller edited this page Nov 3, 2019 · 7 revisions

Overview

Breedbase uses the slurm system for running analysis and other jobs that may benefit from more computing power than the local virtual machine can provide.

Implementation and configuration

The slurm system used in Breedbase is based on the Debian packages slurm-llnl and libslurm-perl. The latter provides the Slurm.pm object which is used for querying the status of Slurm jobs.

By default, slurm runs the jobs on the localhost. The config file is included in the breedbase_dockerfile repo. It assumes a host name of localhost, so if the host name is different, this needs to be reflected in the /etc/slurm-llnl/slurm.conf file in several locations.

Importantly, the parameter SelectType needs to be set to select/cons_res, and the SelectTypeParameter needs to be set to CR_CORE. Defaults will not run more than one job per node. The number of cores needs to be set at the end of the config file. Setting more cores than available on the machine will render slurm non-functional.

To configure Breedbase to run jobs on another host, the sgn_local.conf file has to be modified in the following way:

backend RemoteSlurm
cluster_host [email protected]

The cluster host has to be specially configured to be able to run jobs:

  • It is important that the cluster host be accessed as the same user as the website is run on, which by default is www-data.

  • The cluster and the virtual machine need to mount the same cluster_shared_tmp_dir (in sgn_local.conf).

  • The user needs to be able to login through ssh, using host keys. (For example, for www-data, the host keys need to be setup in /var/www/.ssh/id_rsa on the web server, and /var/www/.ssh/authorized_keys on the cluster host).

  • The environment should be setup in /var/www/.ssh/environment on the cluster host (for example, $PATH and $PERL5LIB variables). Critically, a script that checks the status of the slurm jobs, check_slurm_job.pl (in the cxgn-corelibs/bin directory), needs to be in the $PATH so it can be executed by the slurm system.

  • the cluster host needs a copy of the full Breedbase system installed. In the future, hopefully this can be achieved using a docker container, but not yet.

Clone this wiki locally