Step by step commands to benchmark OpenOmics framework on AWS
- Log in to your AWS account
- Launch a virtual machine with EC2
- Choose an Amazon Machine Image (AMI): Select any 64-bit (x86) AMI (say, Ubuntu Server 20.04 LTS) from “Quick Start”.
- Choose an Instance Type.
- Configure Instance.
- Add Storage: You can add storage based on the workload requirements
- Configure security group
- Review and launch the instance (ensure you have/create a key to ssh login in next step)
- Use SSH to login to the machine after the instance is up and running
- $ ssh -i <key.pem> username@Public-DNS
- The logged in AWS instance machine is now ready to use – you can download OpenOmics workloads and related datasets to be executed on this instance.
AWS c5.12xlarge: 1-instance AWS c5.12xlarge: 48 vCPUs (Cascade Lake), 96 GB total memory, ucode: 0x500320a, Ubuntu 22.04, 5.15.0-1004-aws
AWS m5.12xlarge: 1-instance AWS c5.12xlarge: 48 vCPUs (Cascade Lake), 192 GB total memory, ucode: 0x500320a, Ubuntu 22.04, 5.15.0-1004-aws
AWS c6i.16xlarge: 1-instance AWS c6i.16xlarge: 64 vCPUs (Ice Lake), 128 GB total memory, ucode: 0xd000331, Ubuntu 22.04, 5.15.0-1004-aws
AWS m6i.16xlarge: 1-instance AWS m6i.16xlarge: 64 vCPUs (Ice Lake), 256 GB total memory, ucode: 0xd000331, Ubuntu 22.04, 5.15.0-1004-aws
Step by step instructions to benchmark baseline (bwa-mem) and OpenOmics BWA-MEM (bwa-mem2) on m5.12xlarge and m6i.16xlarge instances of AWS
Download reference genome: Homo_sapiens_assembly38.fasta
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta
Download dataset #1: ERR194147 (Paired-End) from s3://sra-pub-run-odp/sra/ERR194147/ERR194147
Download dataset #2: ERR1955529 (Paired-End) from s3://sra-pub-run-odp/sra/ERR1955529/ERR1955529
Download dataset #3: ERR3239276 (Paired-End) from s3://sra-pub-run-odp/sra/ERR3239276/ERR3239276
curl -L https://github.com/lh3/bwa/releases/download/v0.7.17/bwa-0.7.17.tar.bz2 | tar jxf -
cd bwa-0.7.17 && make
./bwa index hs_asm38/Homo_sapiens_assembly38.fasta
cd ..
curl -L https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2 | tar jxf -
cd bwa-mem2-2.2.1_x86-linux && ./bwa-mem2 index hs_asm38/Homo_sapiens_assembly38.fasta
cd ..
The m5.12xlarge instance has 24 cores, while the m6i.16xlarge instance has 32 cores. The memory available on the instances allows for using both the threads on each core.
cd bwa-0.7.17
Sample commands shown below:
For m5.12xlarge:
./bwa mem -t 48 hs_asm38/Homo_sapiens_assembly38.fasta ERR194147_1.fastq ERR194147_2.fastq > ERR194147.out.sam
For m6i.16xlarge:
./bwa mem -t 64 hs_asm38/Homo_sapiens_assembly38.fasta ERR194147_1.fastq ERR194147_2.fastq > ERR194147.out.sam
cd ../bwa-mem2-2.2.1_x86-linux
Sample commands shown below:
For m5.12xlarge:
./bwa-mem2 mem -t 48 hs_asm38/Homo_sapiens_assembly38.fasta ERR194147_1.fastq ERR194147_2.fastq > ERR194147.out.sam
For m6i.16xlarge:
./bwa-mem2 mem -t 64 hs_asm38/Homo_sapiens_assembly38.fasta ERR194147_1.fastq ERR194147_2.fastq > ERR194147.out.sam
Step by step instructions to benchmark baseline (minimap2) and OpenOmics minimap2 (mm2-fast) on c5.12xlarge and c6i.16xlarge instances of AWS
Download reference genome
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz
Link to download HG002 ONT Guppy 3.6.0 dataset:
https://precision.fda.gov/challenges/10/view
File name: HG002_GM24385_1_2_3_Guppy_3.6.0_prom.fastq.gz
Link to download HG002 HiFi 14kb-15kb dataset:
https://precision.fda.gov/challenges/10/view
File name: HG002_35x_PacBio_14kb-15kb.fastq.gz
Download HG002 CLR dataset from s3://giab/data_indexes/AshkenazimTrio/sequence.index.AJtrio_PacBio_MtSinai_NIST_subreads_fasta_10082018
Download hap2 assembly dataset-
wget https://zenodo.org/record/4393631/files/NA24385.HiFi.hifiasm-0.12.hap2.fa.gz
git clone https://github.com/lh3/minimap2.git -b v2.22
cd minimap2 && make
./minimap2 -ax [preset] [ref-seq] [read-seq] -t 48 > minimap2output
Example command for ONT HG002 dataset:
./minimap2 -ax map-ont GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz HG002_ONT.fastq -t 48 > minimap2output
git clone --recursive https://github.com/bwa-mem2/mm2-fast.git -b mm2-fast-v2.22 mm2-fast-contrib
cd mm2-fast-contrib && make multi
./build rmi.sh path-to-ref-seq <preset flags>
<preset flags> are as follows:
ONT: map-ont
HiFi: map-hifi
CLR: map-pb
Assembly: asm5
Example: Create OpenOmics minimap2 index for ONT datasets for GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz
./build rmi.sh GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz map-ont
./mm2-fast -ax [preset] [ref-seq] [read-seq] -t [num_threads] > mm2-fastoutput
Example command to run HG002 ONT dataset on c5.12xlarge
./mm2-fast -ax map-ont GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz HG002_ONT.fastq -t 48 > mm2-fastoutput
Example command to run HG002 ONT dataset on c6i.16xlarge
./mm2-fast -ax map-ont GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz HG002_ONT.fastq -t 64 > mm2-fastoutput
Step by step instructions to benchmark OpenOmics for ATAC-Seq data analysis on multiple c5.24xlarge and c6i.32xlarge instances of AWS
apt-get update
apt-get install -y git
apt-get install -y libcurl4-openssl-dev
apt-get install -y hdf5-tools
apt-get install -y rsync
apt-get install -y make
apt-get install -y gcc
apt-get install -y libblas-dev
apt-get install -y python3.7 python3-pip
ln -nsf /usr/bin/python3.7 /usr/bin/python
conda create --name Atac python=3.7
conda activate Atac
cd /home/
git clone https://github.com/libxsmm/libxsmm.git
cd /home/libxsmm
git checkout b3da2b1bed9d27f9d6bae91a683f8cf76fe299b5
make -j # Use AVX=2 for AVX2 and AVX=3 for AVX512
cd /home/
export LD_LIBRARY_PATH=/home/libxsmm/lib/
git clone --branch v0.2.0 https://github.com/clara-parabricks/AtacWorks.git
git clone https://github.com/IntelLabs/Trans-Omics-Acceleration-Library.git
cd /home/AtacWorks/
git apply /home/Trans-Omics-Acceleration-Library/applications/ATAC-Seq/AtacWorks_cpu_optimization_patch.patch
python3.7 -m pip install -r requirements-base.txt
python3.7 -m pip install torch torchvision torchaudio
python3.7 -m pip install -r requirements-macs2.txt
# Install torch-ccl
# git clone --branch v1.1.0 https://github.com/intel/torch-ccl.git && cd torch-ccl
# git submodule sync
# git submodule update --init --recursive
# python3.7 setup.py install
cd /home/libxsmm/samples/deeplearning/conv1dopti_layer/Conv1dOpti-extension/
python setup.py install
cd /home/AtacWorks/
python3.7 -m pip install .
atacworks=/home/AtacWorks/
wget https://atacworks-paper.s3.us-east-2.amazonaws.com/dsc_atac_blood_cell_denoising_experiments/50_cells/train_data/noisy_data/dsc.1.Mono.50.cutsites.smoothed.200.bw
wget https://atacworks-paper.s3.us-east-2.amazonaws.com/dsc_atac_blood_cell_denoising_experiments/50_cells/train_data/clean_data/dsc.Mono.2400.cutsites.smoothed.200.bw
wget https://atacworks-paper.s3.us-east-2.amazonaws.com/dsc_atac_blood_cell_denoising_experiments/50_cells/train_data/clean_data/dsc.Mono.2400.cutsites.smoothed.200.3.narrowPeak
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/bedGraphToBigWig /home/
rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/bigWigToBedGraph /home/
export PATH="$PATH:/home/" >> /home/.bashrc # set the path for bedGraphToBigWig binaries
python $atacworks/scripts/peak2bw.py \
--input dsc.Mono.2400.cutsites.smoothed.200.3.narrowPeak \
--sizes $atacworks/data/reference/hg19.chrom.sizes \
--out_dir ./ \
--skip 1
python $atacworks/scripts/get_intervals.py \
--sizes $atacworks/data/reference/hg19.auto.sizes \
--intervalsize 50000 \
--out_dir ./ \
--val chr20 \
--holdout chr10
python $atacworks/scripts/bw2h5.py \
--noisybw dsc.1.Mono.50.cutsites.smoothed.200.bw \
--cleanbw dsc.Mono.2400.cutsites.smoothed.200.bw \
--cleanpeakbw dsc.Mono.2400.cutsites.smoothed.200.3.narrowPeak.bw \
--intervals training_intervals.bed \
--out_dir ./ \
--prefix Mono.50.2400.train \
--pad 5000 \
--nonzero
python $atacworks/scripts/bw2h5.py \
--noisybw dsc.1.Mono.50.cutsites.smoothed.200.bw \
--cleanbw dsc.Mono.2400.cutsites.smoothed.200.bw \
--cleanpeakbw dsc.Mono.2400.cutsites.smoothed.200.3.narrowPeak.bw \
--intervals val_intervals.bed \
--out_dir ./ \
--prefix Mono.50.2400.val \
--pad 5000
export KMP_AFFINITY=compact,1,0,granularity=fine
export LD_PRELOAD=/home/libtcmalloc.so # Copy these files in the /home folder first
export LD_PRELOAD=/home/libjemalloc.so
export OMP_NUM_THREADS=31 # (Available cores (N) - 1)
# In numactl command, "-C 1-31" is for running on cores 1 to 31.
# General case for an N core machine is "-C 1-(N-1)".
# Keep batch size in config/train_config.yaml to a multiple of (N-1) for optimum performance
numactl --membind 0 -C 1-31 python $atacworks/scripts/main.py train \
--config configs/train_config.yaml \
--config_mparams configs/model_structure.yaml \
--files_train $atacworks/Mono.50.2400.train.h5 \
--val_files $atacworks/Mono.50.2400.val.h5
Option - Another option to use on machines without NUMA --- "taskset -c 1-31 python ..."
export OMP_NUM_THREADS=30 # (Available cores (N) - 2)
# 1. change line 23 in configs/train_config.yaml with the following
# dist-backend: 'gloo'
# 2. change line 22 in configs/train_config.yaml with the following
# dist-backend: 'gloo'
# 3. Keep batch size (bs) in config/train_config.yaml to a multiple of (N-2) for optimum performance.
# Batch size gets multiplied by number of socket. Hence, if bs=30, no. of sockets = 16 than batch size = 30*16 = 480
# 4. Comment line the following line (79,80) in AtacWorks/claragenomics/dl4atac/utils.py and reinstall AtacWorks using "pip install ." command.
# if (os.path.islink(latest_symlink)):
# os.remove(latest_symlink)
# 5. Run the following Slurm batch script that uses MPI commands.
sbatch Batchfile_CPU.slurm