RICOPILI

Authors: Stephan Ripke [email protected] | Alice Braun [email protected]

Last update: 2025-03-03

Note

This short README is designed to help you quickly set up and use the RICOPILI pipeline with centrally deployed dependencies and imputation references on the national Dutch supercomputer SURFsnellius.
For a more comprehensive documentation on how to install RICOPILI and its dependencies, run the modules and interpret their output please visit: sites.google.com/a/broadinstitute.org/ricopili/

Download and dependencies

Download RICOPILI from GitHub via ssh from https://github.com/Ripkelab/ricopili

git clone [email protected]:Ripkelab/ricopili.git
mv ricopili/rp_bin/ ~

Trouble running Git?

If you are unable to download from GitHub, you need to copy your SSH key to GitHub first:
ssh-keygen -t rsa -b 4096 
When prompted, save the key in ~/.ssh/id_rsa.

Then, add your SSH key to the agent:
ssh-add ~/.ssh/id_rsa
eval "$(ssh-agent -s)"
Now log into GitHub in your browser and navigate to Settings > SSH and GPG keys.
Click New SSH key and paste the contents of your public key file:
cat  ~/.ssh/id_rsa.pub
Finally, verify the SSH config file:
vim ~/.ssh/config
Add the following lines:
Host github.com
    HostName github.com
    User git
    IdentityFile ~/.ssh/id_rsa
Test the connection:
ssh -T [email protected]

Or via wget:

wget https://personal.broadinstitute.org/braun/sharing/rp_bin.2025_Feb_20.001.tar.gz 
wget https://personal.broadinstitute.org/braun/sharing/rp_bin.2025_Feb_20.001.md5.cksum 

# verify checksum
md5sum rp_bin.2025_Feb_20.001.md5.cksum 
tar -xvzf rp_bin.2025_Feb_20.001.tar.gz

Note

We recommend to install an additional set of software (mostly R packages) via conda/mamba:

# download conda yaml file to build environment with all necessary R packages
wget https://personal.broadinstitute.org/braun/sharing/rp_env_0225b.yaml 
mamba env create -n rp_env -f rp_env.yaml

Note

If you'd like to install RICOPILI on a different cluster than the Broad UGER or the SURFsnellius supercomputer we recommend downloading the depency tarball:

wget https://personal.broadinstitute.org/braun/sharing/ricopili_dependencies_0225b.tar.gz 
wget https://personal.broadinstitute.org/braun/sharing/ricopili_dependencies_0225b.md5.cksum

# verify checksum
md5sum ricopili_dependencies_0225b.tar.gz
tar -xvzf ricopili_dependencies_0225b.tar.gz

Installation on SURFsnellius

On SURFsnellius you may use the centrally installed dependencies on PGC DAC:
/gpfs/work5/0/pgcdac/ricopili_download/dependencies/
To swiftly install RICOPILI you need to create a file called ricopili.conf in your home directory.
Edit ricopili.conf via your preferred text editor and paste the following contents, replacing: init and email with your personal information.

ricopili.conf file

eloc /home/$USER/.conda/envs/rp_env/bin/
i2loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/impute_v2
i4loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/impute_v4
hmloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/hapmap_ref/
minimac3loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/Minimac3/
minimac4loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/Minimac4/minimac4-4.1.2-Linux-x86_64/bin/
gmloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/genetic_map_files 
sh5loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/shapeit5 
plink2loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/plink2/plink2 
rloc /home/$USER/.conda/envs/rp_env/bin/R  
ldsc_start /home/$USER/.conda/envs/ldsc/bin/ # env which runs python 2.7 - installed seperately 
sh3loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/shapeit3
tabixloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/tabix/
bcloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/bcftools/bcftools-1.18
bcloc_plugins /gpfs/work5/0/pgcdac/ricopili_download/dependencies/bcftools/bcftools-1.18/plugins/
ealoc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/eagle
bgziploc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/bgzip/
ldsc_ref /gpfs/work5/0/pgcdac/ricopili_download/dependencies/ldsc/
liloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/liftover
rpac /home/$USER/.conda/envs/rp_env/lib/R/library
p2loc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/plink
shloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/shapeit
meloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/metal/
bcrloc /gpfs/work5/0/pgcdac/ricopili_download/dependencies/bcftools/resources/
home /home/$USER/
sloc /scratch-local
init <YOUR_INITIALS>
email <YOUR_EMAIL>
loloc /home/$USER/
batch_jobcommand sbatch
batch_name -J_SPACE_XXX
batch_jobfile XXX
batch_memory_request NONE
batch_walltime --time=0-HH:MM:SS
batch_array --array=1-XXX%YYY
batch_stdout -o_SPACE_XXX/%x-%j.out
batch_stderr -e_SPACE_XXX/%x-%j.out
batch_job_dependency --dependency=afterany:XXX
batch_array_task_id $SLURM_ARRAY_TASK_ID
batch_max_parallel_jobs_per_one_array added_to_array
batch_other_job_flags --partition=genoa_SPACE_--cpus-per-task=32
batch_job_output_jid Submitted_SPACE_batch_SPACE_job_SPACE_XXX
batch_ncores_per_node 32
batch_mem_per_node 56
queue custom

Start the script ./rp_config from within the rp_bin directory.
This is an interactive script that will take care of the installation in your computer cluster environment.
If RICOPILI is already installed in the system under your account, it will ask you if you wish to unset the Ricopili PATH settings first. For first time custom installation it is highly recommended to do so.
The configuration script will give you the two commands you have to issue. You just need to copy/paste them into the command line.

SURFsnellius rp_config.custom.txt file

If the configuration script cannot find a configuration file (by default the script is looking for a file named rp_config.custom.txt) an empty file is created, that needs to be filled by you and/or a system-administrator.
This file follows a two column structure, where variable-names are found in the first column and variable-values in the second. “###” are comments.
Whitespace can be as long as necessary, spaces are not allowed. Please use term _SPACE_ if needed.
To run the next step of the configuration on SURFsnellius you can copy paste the following into the rp_config.custom.txt at the rp_bin directory, replacing rp_user_initials, rp_user_email, rp_logfiles:

### for details please refer to https://docs.google.com/document/d/14aa-oeT5hF541I8hHsDAL_42oyvlHRC5FWR7gir4xco/edit?usp=sharing
###          and https://docs.google.com/spreadsheets/d/1LhNYIXhFi7yXBC17UkjI1KMzHhKYz0j2hwnJECBGZk4/edit?usp=sharing
variable_name                  variable_value
----------------------------------------------
rp_dependencies_dir /gpfs/work5/0/pgcdac/ricopili_download/dependencies
R_packages_dir      /home/$USER/R/x86_64-pc-linux-gnu-library/4.3/
starting_R          starting_R module_SPACE_load_SPACE_2023;module_SPACE_load_SPACE_R/4.3.2-gfbf-2023a;_SPACE_R
path_to_Perlmodules /gpfs/work5/0/pgcdac/ricopili_download/dependencies/perl_modules
path_to_scratchdir  /scratch-local
starting_ldsc       /home/$USER/.conda/envs/rp_env/bin/
ldsc_reference      /gpfs/work5/0/pgcdac/ricopili_download/dependencies/ldsc
rp_user_initials    <YOUR_INITIALS>
rp_user_email       <YOUR_MAIL>
rp_logfiles         /home/$USER/
----------------------------------------
----------------------------------------
---- jobarray and queueing parameters:
----------------------------------------
----------------------------------------
batch_jobcommand sbatch
batch_memory_request NONE
batch_walltime --time=0-HH:MM:SS
batch_array --array=1-XXX%YYY
batch_max_parallel_jobs_per_one_array added_to_array
batch_jobfile XXX
batch_name -J_SPACE_XXX
batch_stdout -o_SPACE_XXX/%x-%j.out
batch_stderr -e_SPACE_XXX/%x-%j.out
batch_job_dependency --dependency=afterany:XXX
batch_array_task_id $SLURM_ARRAY_TASK_ID
batch_other_job_flags  --partition=thin_SPACE_--cpus-per-task=32
batch_job_output_jid Submitted_SPACE_batch_SPACE_job_SPACE_XXX
batch_ncores_per_node 32
batch_mem_per_node 56

After creating these files run ./rp_config again Follow the instructions but do not replace the config file you have just copy-pasted.

Known issues and bug fixes on SURFsnellius

Warning

Currently, the libgsl.so.23 dependency for EIGENSOFT is not available on SURFsnellius.
you can install EIGENSOFT through conda/ mamba:
mamba install bioconda::eigensoft and add the following to your ricopili.conf (assuming your environment is called "rp_env"):
eloc /home/$USER/.conda/envs/rp_env/bin/
LDSC is not available as a module on SURFsnellius.
As it uses Python 2.7 you can install LDSC into a new environemnt through conda/ mamba:
mamba create -n ldsc python=2.7.15 -c conda-forge -c bioconda ldsc try to start ldsc manually to see if it runs and then add the following to your rp_config file: ldsc_start /home/$USER/.conda/envs/ldsc/bin/
Currently, you need to manually load texlive and GCC in order for several modules to run (e.g. pcaer). You can also add this to your bashrc directly:
module load 2022 \module load texlive/20230313-GCC-11.3.0

Quick tutorial

Quality control module (pre-imputation)

This module performs SNP and Sample quality control (QC) of multiple datasets in parallel. It's highly recommend to go through RICOPILI tutorial before using this module.
All modules have a --help flag to show all available functions and options.

Input Requirements

Binary PLINK files (bed/bim/fam), multiple datasets in working directory are allowed
Phenotypes coded as 1 (control) or 2 (case)
Allele names A,C,G,T Genome build hg16, hg17, hg18, hg19 or hg38 are supported
To start genomic quality control run the following command:

#start qc module
preimp_dir --dis scz --pop eur --out outname 

# to edit file QC cycle and naming
vim *.names 

# resubmit
preimp_dir --dis scz --pop eur --out outname

Principal component module

This module takes PLINK binary output file from the Preimputation/QC step and calculates the principal components, determines overlapping samples, determines which covariates are associated with the genotype data, and generates PCA plots a to check the ancestry of the cohorts and to exclude ancestry outliers. To conduct a princpal component analysis run the following command:

# Run pca of QUed sample
pcaer --out output_name bfile-qc.bim

Imputation module

This module performs imputation on binary PLINK datasets generated by the Preimputation-QC step. The output is a set of dosage probabilities for all markers in a user-specified reference panel (there are a number of reference panels to choose from including MHC classical alleles and amino acids, HapMap, HRC, and 1000 Genomes).

To conduct genotype imputation based on your reference run following command:

impute_dirsub --refdir imputation_reference --out outname

GWAS and meta-analysis module (post-imputation)

This module performs association analyses for common variants from imputed dosage data for each dataset QC'd in the Preimputation step and then does a final meta-analysis using METAL. Population stratification is accounted for using principal components generated from the PCA step.

postimp_navi --out OUTNAME --mds prune.bfile.cobg.outname_pca.menv.mds_cov --coco 1,2,3,4,5,6 --addout run1

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
rp_bin		rp_bin
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RICOPILI

Table of Contents

Download and dependencies

Installation on SURFsnellius

ricopili.conf file

SURFsnellius rp_config.custom.txt file

Known issues and bug fixes on SURFsnellius

Quick tutorial

Quality control module (pre-imputation)

Input Requirements

Principal component module

Imputation module

GWAS and meta-analysis module (post-imputation)

About

Releases

Packages

Contributors 5

Languages

Ripkelab/ricopili

Folders and files

Latest commit

History

Repository files navigation

RICOPILI

Table of Contents

Download and dependencies

Installation on SURFsnellius

ricopili.conf file

SURFsnellius rp_config.custom.txt file

Known issues and bug fixes on SURFsnellius

Quick tutorial

Quality control module (pre-imputation)

Input Requirements

Principal component module

Imputation module

GWAS and meta-analysis module (post-imputation)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages