Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sanchit-misra authored Feb 22, 2024
1 parent ddf6568 commit 9ec5513
Showing 1 changed file with 54 additions and 29 deletions.
83 changes: 54 additions & 29 deletions benchmarking/AWS-Intel-blog-v2.1-2024/README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,44 @@
# Benchmarking of Open Omics Acceleration Framework on AWS
Step by step commands to benchmark Open Omics Acceleration Framework on AWS
1. Log in to your AWS account.
2. Launch a virtual machine with EC2.
* Choose an Amazon Machine Image (AMI): Select any 64-bit (x86) AMI (say, Ubuntu Server 22.04 LTS) from “Quick Start”.
* Choose an Instance Type.
* Configure the instance.
* Add Storage: You can add storage based on the workload requirements.
* Configure the security group.
* Review and launch the instance (ensure you have or create a key to SSH login in next step)
3. Use SSH to login to the machine after the instance is up and running
* $ ssh -i <key.pem> username@Public-DNS
4. The logged in AWS instance machine is now ready to use – you can download Open Omics Acceleration Framework and related datasets to be executed on this instance.
# Steps to use to reproduce benchmarking results of Open Omics Acceleration Framework v2.1 on AWS EC2 instances published in the AWS Intel blog

## Machine configurations used for benchmarking
## Preparing an AWS EC2 instance for benchmarking

AWS r7i.24xlarge : 1-instance AWS r7i.24xlarge: 96 vCPUs (Sapphire Rapids), 768 GB total memory, Ubuntu 22.04
1. Log in to your AWS account
2. Launch a virtual machine with EC2. The configuration details of instances used for each pipeline are mentioned in the respective sections below.
* Choose an Amazon Machine Image (AMI): From “Quick Start”, select "Ubuntu Server 22.04 LTS (HVM), SSD Volume Type".
* Choose an Instance Type.
* Configure Instance.
* Add Storage: You can add storage based on the workload requirements
* Configure security group
* Review and launch the instance (ensure you have/create a key to ssh login in next step)
3. Use SSH to login to the machine after the instance is up and running
* $ ssh -i <key.pem> username@Public-DNS
4. The logged in AWS instance machine is now ready to use – you can download Open Omics workloads and related datasets to be executed on this instance.

AWS c7i.24xlarge: 1-instance AWS c7i.24xlarge: 96 vCPUs (Sapphire Rapids), 192 GB total memory, Ubuntu 22.04

AWS m7i.24xlarge: 1-instance AWS m7i.24xlarge: 96 vCPUs (Sapphire Rapids), 384 GB total memory, Ubuntu 22.04
## AlphaFold2-based Protein Folding Pipeline

AWS m7i.48xlarge: 1-instance AWS m7i.48xlarge: 192 vCPUs (Sapphire Rapids), 768 GB total memory, Ubuntu 22.04
### Configuration Details

# Step by step instructions to benchmark alphafold2-based-protein-folding baseline and Open Omics Acceleration Framework pipeline.
**BASELINE on m7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 384GB (1 slot/ 384GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, PyTorch - v1.12.1, JAX - v0.4.14, OpenFold - v 1.0.1, Hmmer - v3.3.2, hh-suite - v3.3.0, Kalign2 – v2.04, model name & version: AlphaFold2

**Open Omics on m7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 384GB (1 slot/ 384GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Intel-python - 2022.1.0, JAX - v0.4.21, Open Omics Acceleration Framework v2.1, Open Omics AlphaFold2, - v1.0, IntelLabs Hmmer v1.0, IntelLabs hh-suite v1.0), Kalign2 – v2.04, framework version: PyTorch – v2.0.1, model name & version: AlphaFold2

**Open Omics on m7i.48xlarge**: Test by Intel as of <11/30/23>. 1 instance, 2-sockets, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 384GB (1 slot/ 384GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Intel-python - 2022.1.0, JAX - v0.4.21, Open Omics Acceleration Framework v2.1, Open Omics AlphaFold2, - v1.0, IntelLabs Hmmer v1.0, IntelLabs hh-suite v1.0), Kalign2 – v2.04, framework version: PyTorch – v2.0.1, model name & version: AlphaFold2


### Step by step instructions to benchmark baseline and Open Omics Acceleration Framework

```sh
cd ~
git clone --recursive https://github.com/IntelLabs/Open-Omics-Acceleration-Framework.git
```

- Test dataset can be donload from https://www.uniprot.org/proteomes/UP000001940. Click on 'Download' and select options **Download only reviewed (Swiss-Prot:) canonical proteins (4,463)**, Format: **Fasta** and Compressed: **No**.
- Test dataset can be downloaded from https://www.uniprot.org/proteomes/UP000001940. Click on 'Download' and select options **Download only reviewed (Swiss-Prot:) canonical proteins (4,463)**, Format: **Fasta** and Compressed: **No**.

- Save the file as 'uniprotkb_proteome.fasta' inside folder ~/Open-Omics-Acceleration-Framework/benchmarking/AWS-Intel-blog-v2.1-2024/


## Baseline ([openfold](https://github.com/aqlaboratory/openfold))
#### Baseline ([OpenFold](https://github.com/aqlaboratory/openfold))

EC2Instance: m7i.24xlarge

Expand Down Expand Up @@ -77,7 +81,7 @@ python3 run_pretrained_openfold.py \
Note: Change cpus as available vcpus and use --long_seququence_inference option if you are running long sequences.


## Open Omics Acceleration Framework alphafold2-based-protein-folding pipeline
#### Open Omics Acceleration Framework alphafold2-based-protein-folding pipeline

EC2Instance: m7i.24xlarge, m7i.48xlarge

Expand All @@ -97,9 +101,20 @@ docker run -it --cap-add SYS_NICE -v $DATA_DIR:/data \
alphafold:latest
```

# Step by step instructions to benchmark deepvariant-based-germline-variant-calling-fq2vcf baseline and Open Omics Acceleration Framework pipeline.
## DeepVariant-based germline variant calling pipeline (fq2vcf)

### Configuration Details

**BASELINE on c7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: bwa-mem v0.7.17, Samtools v. 1.16.1, DeepVariant v1.5, framework version: Intel-tensorflow 2.11.0, model name & version: Inception V3

**Open Omics on c7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: bwa-mem2 v2.2.1, Samtools v. 1.16.1, Open Omics DeepVariant v1.0, Open Omics Acceleration Framework v.2.1, framework version: Intel-tensorflow 2.11.0, model name & version: Inception V3

**Open Omics on c7i.48xlarge**: Test by Intel as of <11/30/23>. Up to 8 instances, 16-sockets; Each socket has 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: bwa-mem2 v2.2.1, Samtools v. 1.16.1, Open Omics DeepVariant v1.0, Open Omics Acceleration Framework v.2.1, framework version: Intel-tensorflow 2.11.0, model name & version: Inception V3

## Dataset

### Step by step instructions to benchmark baseline and Open Omics Acceleration Framework

#### Dataset
```sh
mkdir -p ~/HG001
wget https://genomics-benchmark-datasets.s3.amazonaws.com/google-brain/fastq/novaseq/wgs_pcr_free/30x/HG001.novaseq.pcr-free.30x.R1.fastq.gz -P ~/HG001/
Expand All @@ -108,7 +123,7 @@ wget https://broad-references.s3.amazonaws.com/hg38/v0/Homo_sapiens_assembly38.f

```

## Baseline
#### Baseline

EC2Instance: c7i.24xlarge
Prerequisite : docker/podman
Expand Down Expand Up @@ -154,7 +169,7 @@ cd ../../pipelines/deepvariant-based-germline-variant-calling-fq2vcf/
#run pipeline
bash run_pipe_bwa.sh
```
## Open Omics Acceleration Framework deepvariant-based-germline-variant-calling-fq2vcf pipeline
#### Open Omics Acceleration Framework deepvariant-based-germline-variant-calling-fq2vcf pipeline

EC2Instance: c7i.24xlarge, c7i.48xlarge

Expand All @@ -166,9 +181,19 @@ To run on 2 x c7i.48xlarge, 4 x c7i.48xlarge, 8 x c7i.48xlarge follow [link](htt



# Step by step instructions to benchmark single-cell-RNA-seq-analysis baseline and Open Omics Acceleration Framework pipeline.
## Single-cell RNA-seq analysis pipeline

### Configuration Details

**BASELINE on r7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 768GB (1 slot/ 768GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: scanpy v 1.9.1

**Open Omics on r7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 768GB (1 slot/ 768GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Open Omics Acceleration Framework v.2.1

**Open Omics on c7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Open Omics Acceleration Framework v.2.1

### Step by step instructions to benchmark baseline and Open Omics Acceleration Framework

## Baseline ([rapids-single-cell-examples](https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-example))
#### Baseline ([rapids-single-cell-examples](https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-example))

EC2Instance: r7i.24xlarge

Expand All @@ -184,7 +209,7 @@ python -m ipykernel install --user --display-name "Python (rapidgenomics)"
```
Note: Open Jupyter notebook and Select **1M_brain_cpu_analysis.ipynb** file and run all cells.

## Open Omics Acceleration Framework single-cell-RNA-seq-analysis pipeline
#### Open Omics Acceleration Framework single-cell-RNA-seq-analysis pipeline

EC2Instance: r7i.24xlarge and c7i.24xlarge

Expand Down

0 comments on commit 9ec5513

Please sign in to comment.