-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cambridge.md #573
Update cambridge.md #573
Changes from all commits
8f12830
faa406e
1f73a79
fe1d2ec
5b4c79d
79d08b9
783a21a
130f6e5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,10 +5,79 @@ To use, run the pipeline with `-profile cambridge`. This will download and launc | |
with a setup suitable for the Cambridge HPC cluster. Using this profile, either a docker image containing all of the required software will be downloaded, | ||
and converted to a Singularity image or a Singularity image downloaded directly before execution of the pipeline. | ||
|
||
The latest version of Nextflow is not installed by default on the Cambridge HPC cluster. You will need to install it into a directory you have write access to. | ||
Follow these instructions from the Nextflow documentation. | ||
### Install Nextflow | ||
|
||
- Install Nextflow : [here](https://www.nextflow.io/docs/latest/getstarted.html#) | ||
The latest version of Nextflow is not installed by default on the Cambridge HPC cluster CSD3. You can install it with conda: | ||
|
||
``` | ||
module load miniconda/3 | ||
|
||
# set up Bioconda according to the Bioconda documentation, notably setting up channels | ||
conda config --add channels defaults | ||
conda config --add channels bioconda | ||
conda config --add channels conda-forge | ||
|
||
# create the environment env_nf, and install the tool nextflow | ||
conda create --name env_nf nextflow | ||
|
||
# activate the environment containing nextflow | ||
conda activate env_nf | ||
|
||
# once done with the environment, deactivate | ||
conda deactivate | ||
``` | ||
|
||
Alternatively, you can install Nextflow into a directory you have write access to. | ||
Follow [these instructions](https://www.nextflow.io/docs/latest/getstarted.html#) from the Nextflow documentation. This alternative method requires also to update java. | ||
|
||
``` | ||
# move to desired directory on HPC | ||
cd /home/<username>/path/to/dir | ||
|
||
# get the newest version | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
wget -qO- https://get.nextflow.io | bash | ||
|
||
# update java version to the latest | ||
wget https://download.oracle.com/java/20/latest/jdk-20_linux-x64_bin.tar.gz | ||
tar xvfz jdk-20_linux-x64_bin.tar.gz | ||
|
||
# if all tools are compatible with the java version you chose, add these lines to .bashrc | ||
export JAVA_HOME=/home/<username>/path/to/dir/jdk-20.0.1 | ||
export PATH=/home/<username>/path/to/dir/jdk-20.0.1/bin:$PATH | ||
|
||
# Once above is done `java --version` should return `java 20.0.1 2023-04-18` | ||
java --version | ||
|
||
``` | ||
|
||
### Set up Singularity cache | ||
|
||
Singularity allows the use of containers and will use a caching strategy. First, you might want to set the `NXF_SINGULARITY_CACHEDIR` bash environment variable, pointing at your hpc-work location. If not, it will be automatically assigned to the current directory. | ||
|
||
``` | ||
# do this once per login, or add these lines to .bashrc | ||
export NXF_SINGULARITY_CACHEDIR=/home/<username>/rds/hpc-work/path/to/cache/dir | ||
``` | ||
|
||
Once done, and ready to use Nextflow, you can check that the Singularity module is loaded by default when logging on the cluster. | ||
|
||
``` | ||
module list | ||
|
||
# If singularity is not loaded: | ||
module load singularity | ||
jfy133 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
### Run Nextflow | ||
|
||
Here is an example with the nf-core pipeline sarek ([read documentation here](https://nf-co.re/sarek/3.3.2)). | ||
The user includes the project name and the node. | ||
|
||
``` | ||
# Launch the nf-core pipeline for a test database | ||
# with the Cambridge profile | ||
nextflow run nf-core/sarek -profile test,cambridge.config --partition "cclake" --project "NAME-SL3-CPU" --outdir nf-sarek-test | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is partition and project in sarek? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be ok to keep, but you may end up with validation errors as they aren't pipeline specific parameters. It would make make more sense to have them set as environmental variables and insert them via that way |
||
``` | ||
|
||
All of the intermediate files required to run the pipeline will be stored in the `work/` directory. It is recommended to delete this directory after the pipeline | ||
has finished successfully because it can get quite large, and all of the main output files will be saved in the `results/` directory anyway. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and partition should also be removed as well theoretically