-
Notifications
You must be signed in to change notification settings - Fork 4
How to install
For PGSToolKit
to work, you'll need to create a specific python environment (3.7+) and install PLINK2
, bgenix
, samtools
, vcftools
, and bcftools
. PGSToolKit
should both work in the context of Rocky8
(Linux) and macOS Sequoia with brew
.
When you use macOS you can use brew
to install missing packages.
You probably have no admin rights on a HPC, so nothing to report here.
On Rocky8
cd /path_to_software
mkdir -v R-4.4.1 # (currently the latest version)
mkdir -v tmp
cd /tmp
wget http://cran-mirror.cs.uu.nl/src/base/R-3/R-4.4.1.tar.gz
tar -zxvf R-4.4.1.tar.gz
cd R-4.4.1
./configure --prefix=/path_to_software/R-4.4.1 --enable-java=no
make
make install
On macOS Sequoia with brew
Make sure to get the binary version of r
instead of building it from sources - these prevents a lot of issues down the road.
brew install --cask r
Installation R packages
Next, install RapidoPGS
and some other R packages that will come in handy when visualizing results.
install.packages("RapidoPGS")
We also require the GenomicRanges
packages for RapidoPGS
.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GenomicRanges")
For visualizations we prefer the ggpubr
packages.
install.packages("ggpubr")
We need to install PLINK2
for many of the data wrangling and the calculation of p-value threshold based polygenic scores. PLINK2
can be downloaded here. Choose the right version for your system (Linux or macOS); it's probably best to work with the alpha version.
mkdir -v path/to/plink
cd path/to/plink
wget https://s3.amazonaws.com/plink2-assets/alpha5/plink2_linux_x86_64_20240625.zip
unzip plink2_linux_x86_64_20240625.zip
cd ../
chmod -Rv a+rwx plink
We have mambaforge3
installed on our cluster (and macOS Sequoia), so:
First, we create an environment.
mamba create --name pgstoolkit python=3.9
Next, we activate it.
mamba activate pgstoolkit
Now, we're ready to install some packages.
mamba install conda-forge::bgenix
We also want samtools
, vcftools
, and bcftools
.
mamba install bioconda::samtools bioconda::bcftools bioconda::vcftools
Lastly, we make sure to get tabix
and bgzip
which are part of htslib
.
mamba install bioconda::htslib
We also need PGS-CS
which can be found here. Follow the instructions provided there, and make sure to download the reference files too.
mkdir -v path/to/git_repositories
git clone https://github.com/getian107/PRScs.git
cd ../
You can find PRSice2
here. Download and install the proper version as per their instructions.
mkdir -v path/to/prsice
cd path/to/prsice
wget https://github.com/choishingwan/PRSice/releases/download/2.3.5/PRSice_linux.zip
unzip PRSice_linux.zip
cd ../
chmod -Rv a+rwx prsice
LDpred2
While we haven't implemented LDpred2
yet, you can of course install this too.
First, start the R
version we just installed.
R
Next, install LDpred2
.
install.packages("remotes")
remotes::install_github("privefl/bigsnpr")
PGS Calculato
The PGS Catalog contains many PGS which you could apply in your data. You can use PGS Calculato to calculate a given PGS in your dataset. As per their instructions install the latest version:
- Download pgs-calc-*.tar.gz from latest release
- Extract the downloaded archive (e.g tar -xf pgs-calc-*.tar.gz)
- Validate installation with
pgs-calc --version
mkdir -v path/to/pgs_calc
cd path/to/pgs_calc
wget https://github.com/lukfor/pgs-calc/releases/download/v1.6.1/pgs-calc-1.6.1.tar.gz
tar -xvf pgs-calc-1.6.1.tar.gz
cd ../
chmod -Rv a+rwx pgs_calc