Skip to content

How to install

Sander W. van der Laan edited this page Aug 16, 2024 · 2 revisions

For PGSToolKit to work, you'll need to create a specific python environment (3.7+) and install PLINK2, bgenix, samtools, vcftools, and bcftools. PGSToolKit should both work in the context of Rocky8 (Linux) and macOS Sequoia with brew.

Installation

macOS

When you use macOS you can use brew to install missing packages.

Rocky8

You probably have no admin rights on a HPC, so nothing to report here.

Step 1: install R

On Rocky8

cd /path_to_software
mkdir -v R-4.4.1 # (currently the latest version)
mkdir -v tmp
cd /tmp
wget http://cran-mirror.cs.uu.nl/src/base/R-3/R-4.4.1.tar.gz
tar -zxvf R-4.4.1.tar.gz
cd R-4.4.1
./configure --prefix=/path_to_software/R-4.4.1 --enable-java=no
make
make install

On macOS Sequoia with brew

Make sure to get the binary version of r instead of building it from sources - these prevents a lot of issues down the road.

brew install --cask r

Installation R packages

Next, install RapidoPGS and some other R packages that will come in handy when visualizing results.

install.packages("RapidoPGS")

We also require the GenomicRanges packages for RapidoPGS.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GenomicRanges")

For visualizations we prefer the ggpubr packages.

install.packages("ggpubr")

Step 2: install PLINK2

We need to install PLINK2 for many of the data wrangling and the calculation of p-value threshold based polygenic scores. PLINK2 can be downloaded here. Choose the right version for your system (Linux or macOS); it's probably best to work with the alpha version.

mkdir -v path/to/plink
cd path/to/plink
wget https://s3.amazonaws.com/plink2-assets/alpha5/plink2_linux_x86_64_20240625.zip
unzip plink2_linux_x86_64_20240625.zip
cd ../
chmod -Rv a+rwx plink

Step 3: create pgstoolkit environment

We have mambaforge3 installed on our cluster (and macOS Sequoia), so:

First, we create an environment.

mamba create --name pgstoolkit python=3.9

Next, we activate it.

mamba activate pgstoolkit 

Now, we're ready to install some packages.

mamba install conda-forge::bgenix

We also want samtools, vcftools, and bcftools.

mamba install bioconda::samtools bioconda::bcftools bioconda::vcftools

Lastly, we make sure to get tabix and bgzip which are part of htslib.

mamba install bioconda::htslib

Step 4: install PGS-CS

We also need PGS-CS which can be found here. Follow the instructions provided there, and make sure to download the reference files too.

mkdir -v path/to/git_repositories
git clone https://github.com/getian107/PRScs.git
cd ../

Step 5: install PRSice2

You can find PRSice2 here. Download and install the proper version as per their instructions.

mkdir -v path/to/prsice
cd path/to/prsice
wget https://github.com/choishingwan/PRSice/releases/download/2.3.5/PRSice_linux.zip
unzip PRSice_linux.zip
cd ../
chmod -Rv a+rwx prsice

Step 6: Alternative tools

LDpred2

While we haven't implemented LDpred2 yet, you can of course install this too.

First, start the R version we just installed.

R

Next, install LDpred2.

install.packages("remotes")
remotes::install_github("privefl/bigsnpr")

PGS Calculato

The PGS Catalog contains many PGS which you could apply in your data. You can use PGS Calculato to calculate a given PGS in your dataset. As per their instructions install the latest version:

  • Download pgs-calc-*.tar.gz from latest release
  • Extract the downloaded archive (e.g tar -xf pgs-calc-*.tar.gz)
  • Validate installation with pgs-calc --version
mkdir -v path/to/pgs_calc
cd path/to/pgs_calc
wget https://github.com/lukfor/pgs-calc/releases/download/v1.6.1/pgs-calc-1.6.1.tar.gz
tar -xvf pgs-calc-1.6.1.tar.gz
cd ../
chmod -Rv a+rwx pgs_calc