Skip to content

Small Script to demonstrate usage of R on UGent HPC

License

Notifications You must be signed in to change notification settings

NeuroStat/DemoHPC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HPC and R

Small Script to demonstrate running R scripts in parallel on UGent HPC

The High Perfomance Computing environment is an excellent tool to accelerate computations. Note, that you only gain time by running calculations in parallel. The HPC infrastructure is not a kind of supercomputer that runs faster than your regular PC!

In order to do this, we have a general workflow.

After setting up an account: HPC

Getting your files onto the HPC: Copy Data

And login to the infrastructure: Login

It is time to go!

Commands to remember

Some useful commands in bash are:

# To see where you are:
pwd

# To list the files in your current directory:
ls

# To navigate to a folder:
cd folder

# To navigate to folder above:
cd ..

# To create a new folder named output
mkdir output

# To create a new folder named error
mkdir error

# To execute a bash script K = 8 times (over K = 8 different cores, generates K IDs).
# You can chance this 1-8 to any number!
qsub -t 1-8 Bash.sh

# To track progress of running jobs:
qstat -ta

Main Bash file

We first consider the bash file.

The first lines in this file are always needed:

#!/bin/sh
#
#
#PBS -N Demo
#PBS -o output/output.file
#PBS -e error/error.file
#PBS -m ae
#PBS -l walltime=02:30:00
#

The commands after #PBS are options:

  • -N gives your job a name (useful)
  • -o sets directory where output that is being printed to a console can be saved
  • -e same, but outputs the error (if one occurs)
  • -m Send me an e-mail when:
    • a ==> job gets aborted
    • e ==> job is executed
  • -l the expected runtime (walltime) of your script
    • Important to have a decent walltime, any job exceeding this will be terminated!!!

Jobs with a walltime larger than 12:00:00 will end up in a longer waiting queue, than jobs with a shorter walltime!

Then you will see a line with the modules (think of pieces of software) that are needed

module load R

In this demo, we show how arguments can be given to a command. This is demonstrated in giving your vsc number so we can use this in R to output to your directory.

The main command is:

Rscript MainR.R ${PBS_ARRAYID} $vsc

This starts an R script called MainR.R. It has two arguments. The first ${PBS_ARRAYID} is an ID obtained by executing <qsub -t 1-8>. It means that it will start 8 times the MainR.R script, with each time a number ranging from 1 to 8. The second argument is the vsc number.

R script

Our R file is called MainR.R.

When developing your R script, it is important to utilize the IDs. As shown below, we get the ID from the bash file and assign it to K:

# activate input
input <- commandArgs(TRUE)
# Here we say, take the first argument from your input.
  # Hence this corresponds to PBS_ARRAYID in your bash script
K <- as.numeric(as.character(input)[1])

# You can give more arguments, e.g. here is your account number (second argument)
vsc <- as.numeric(as.character(args)[2])

If you run simulations or seperate analyses, make sure you use this K:

# Seed based on K !!!
seed <- 50*K
set.seed(seed)

About

Small Script to demonstrate usage of R on UGent HPC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published