Update README.md

IntelLabs · Feb 22, 2024 · 9ec5513 · 9ec5513
1 parent ddf6568
commit 9ec5513
Showing 1 changed file with 54 additions and 29 deletions.
diff --git a/benchmarking/AWS-Intel-blog-v2.1-2024/README.md b/benchmarking/AWS-Intel-blog-v2.1-2024/README.md
@@ -1,40 +1,44 @@
-# Benchmarking of Open Omics Acceleration Framework on AWS
-Step by step commands to benchmark Open Omics Acceleration Framework on AWS
-1. Log in to your AWS account.
-2. Launch a virtual machine with EC2.
-   * Choose an Amazon Machine Image (AMI): Select any 64-bit (x86) AMI  (say, Ubuntu Server 22.04 LTS) from “Quick Start”.
-   * Choose an Instance Type.
-   * Configure the instance.
-   * Add Storage: You can add storage based on the workload requirements.
-   * Configure the security group.
-   * Review and launch the instance (ensure you have or create a key to SSH login in next step)
-3. Use SSH to login to the machine after the instance is up and running
-   * $ ssh -i <key.pem> username@Public-DNS
-4. The logged in AWS instance machine is now ready to use – you can download Open Omics Acceleration Framework and related datasets to be executed on this instance.
+# Steps to use to reproduce benchmarking results of Open Omics Acceleration Framework v2.1 on AWS EC2 instances published in the AWS Intel blog
 
-## Machine configurations used for benchmarking
+## Preparing an AWS EC2 instance for benchmarking
 
-AWS r7i.24xlarge : 1-instance AWS r7i.24xlarge: 96 vCPUs (Sapphire Rapids), 768 GB total memory, Ubuntu 22.04
+1.	Log in to your AWS account
+2.	Launch a virtual machine with EC2. The configuration details of instances used for each pipeline are mentioned in the respective sections below.
+  * Choose an Amazon Machine Image (AMI): From “Quick Start”, select "Ubuntu Server 22.04 LTS (HVM), SSD Volume Type".
+  * Choose an Instance Type.
+  * Configure Instance.
+  * Add Storage: You can add storage based on the workload requirements
+  * Configure security group
+  * Review and launch the instance (ensure you have/create a key to ssh login in next step)
+3.	Use SSH to login to the machine after the instance is up and running
+  * $ ssh -i <key.pem> username@Public-DNS
+4.	The logged in AWS instance machine is now ready to use – you can download Open Omics workloads and related datasets to be executed on this instance.
 
-AWS c7i.24xlarge: 1-instance AWS c7i.24xlarge: 96 vCPUs (Sapphire Rapids), 192 GB total memory, Ubuntu 22.04
 
-AWS m7i.24xlarge: 1-instance AWS m7i.24xlarge: 96 vCPUs (Sapphire Rapids), 384 GB total memory, Ubuntu 22.04
+## AlphaFold2-based Protein Folding Pipeline
 
-AWS m7i.48xlarge: 1-instance AWS m7i.48xlarge: 192 vCPUs (Sapphire Rapids), 768 GB total memory, Ubuntu 22.04
+### Configuration Details
 
-# Step by step instructions to benchmark alphafold2-based-protein-folding baseline and Open Omics Acceleration Framework pipeline.
+**BASELINE on m7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 384GB (1 slot/ 384GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, PyTorch - v1.12.1, JAX - v0.4.14, OpenFold - v 1.0.1, Hmmer - v3.3.2, hh-suite - v3.3.0, Kalign2 – v2.04, model name & version: AlphaFold2
+
+**Open Omics on m7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 384GB (1 slot/ 384GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Intel-python - 2022.1.0, JAX - v0.4.21, Open Omics Acceleration Framework v2.1, Open Omics AlphaFold2, - v1.0, IntelLabs Hmmer v1.0, IntelLabs hh-suite v1.0), Kalign2 – v2.04, framework version: PyTorch – v2.0.1, model name & version: AlphaFold2
+
+**Open Omics on m7i.48xlarge**: Test by Intel as of <11/30/23>. 1 instance, 2-sockets, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 384GB (1 slot/ 384GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Intel-python - 2022.1.0, JAX - v0.4.21, Open Omics Acceleration Framework v2.1, Open Omics AlphaFold2, - v1.0, IntelLabs Hmmer v1.0, IntelLabs hh-suite v1.0), Kalign2 – v2.04, framework version: PyTorch – v2.0.1, model name & version: AlphaFold2
+
+
+### Step by step instructions to benchmark baseline and Open Omics Acceleration Framework
 
 ```sh
  cd ~
  git clone --recursive https://github.com/IntelLabs/Open-Omics-Acceleration-Framework.git
 ```
 
-- Test dataset can be donload from https://www.uniprot.org/proteomes/UP000001940. Click on 'Download' and select options **Download only reviewed (Swiss-Prot:) canonical proteins (4,463)**, Format: **Fasta** and Compressed: **No**.
+- Test dataset can be downloaded from https://www.uniprot.org/proteomes/UP000001940. Click on 'Download' and select options **Download only reviewed (Swiss-Prot:) canonical proteins (4,463)**, Format: **Fasta** and Compressed: **No**.
 
 - Save the file as 'uniprotkb_proteome.fasta' inside  folder ~/Open-Omics-Acceleration-Framework/benchmarking/AWS-Intel-blog-v2.1-2024/
 
 
-## Baseline ([openfold](https://github.com/aqlaboratory/openfold))
+#### Baseline ([OpenFold](https://github.com/aqlaboratory/openfold))
 
 EC2Instance: m7i.24xlarge
 
@@ -77,7 +81,7 @@ python3 run_pretrained_openfold.py \
 Note: Change cpus as available vcpus and use --long_seququence_inference option if you are running long sequences.
 
 
-## Open Omics Acceleration Framework alphafold2-based-protein-folding pipeline
+#### Open Omics Acceleration Framework alphafold2-based-protein-folding pipeline
 
 EC2Instance: m7i.24xlarge, m7i.48xlarge
 
@@ -97,9 +101,20 @@ docker run -it --cap-add SYS_NICE -v $DATA_DIR:/data \
     alphafold:latest
 ```
 
-# Step by step instructions to benchmark deepvariant-based-germline-variant-calling-fq2vcf baseline and Open Omics Acceleration Framework pipeline.
+## DeepVariant-based germline variant calling pipeline (fq2vcf)
+
+### Configuration Details
+
+**BASELINE on c7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: bwa-mem v0.7.17, Samtools v. 1.16.1, DeepVariant v1.5, framework version: Intel-tensorflow 2.11.0, model name & version: Inception V3
+
+**Open Omics on c7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: bwa-mem2 v2.2.1, Samtools v. 1.16.1, Open Omics DeepVariant v1.0, Open Omics Acceleration Framework v.2.1, framework version: Intel-tensorflow 2.11.0, model name & version: Inception V3
+
+**Open Omics on c7i.48xlarge**: Test by Intel as of <11/30/23>. Up to 8 instances, 16-sockets; Each socket has 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: bwa-mem2 v2.2.1, Samtools v. 1.16.1, Open Omics DeepVariant v1.0, Open Omics Acceleration Framework v.2.1, framework version: Intel-tensorflow 2.11.0, model name & version: Inception V3
 
-## Dataset 
+
+### Step by step instructions to benchmark baseline and Open Omics Acceleration Framework
+
+#### Dataset 
 ```sh
 mkdir -p ~/HG001
 wget https://genomics-benchmark-datasets.s3.amazonaws.com/google-brain/fastq/novaseq/wgs_pcr_free/30x/HG001.novaseq.pcr-free.30x.R1.fastq.gz -P ~/HG001/
@@ -108,7 +123,7 @@ wget https://broad-references.s3.amazonaws.com/hg38/v0/Homo_sapiens_assembly38.f
 
 ```
 
-## Baseline
+#### Baseline
 
 EC2Instance: c7i.24xlarge
 Prerequisite : docker/podman 
@@ -154,7 +169,7 @@ cd ../../pipelines/deepvariant-based-germline-variant-calling-fq2vcf/
 #run pipeline
 bash run_pipe_bwa.sh
 ```
-## Open Omics Acceleration Framework deepvariant-based-germline-variant-calling-fq2vcf pipeline
+#### Open Omics Acceleration Framework deepvariant-based-germline-variant-calling-fq2vcf pipeline
 
 EC2Instance: c7i.24xlarge, c7i.48xlarge
 
@@ -166,9 +181,19 @@ To run on 2 x c7i.48xlarge, 4 x c7i.48xlarge, 8 x c7i.48xlarge follow [link](htt
 
 
 
-# Step by step instructions to benchmark single-cell-RNA-seq-analysis  baseline and Open Omics Acceleration Framework pipeline.
+## Single-cell RNA-seq analysis pipeline
+
+### Configuration Details
+
+**BASELINE on r7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 768GB (1 slot/ 768GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: scanpy v 1.9.1
+
+**Open Omics on r7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 768GB (1 slot/ 768GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Open Omics Acceleration Framework v.2.1
+
+**Open Omics on c7i.24xlarge**: Test by Intel as of <11/30/23>. 1 instance, 1-socket, 1x Intel® Xeon® Platinum 8488C, 48 cores, HT On, Turbo On, Total Memory 192GB (1 slot/ 192GB/ DDR5 4800 MT/s), bios: Amazon EC2 v 1.0, ucode version: 0x2b0004b1, OS Version: Ubuntu 22.04.3 LTS, kernel version: 6.2.0-1017-aws, compiler version: g++ 11.4.0, workload version: Open Omics Acceleration Framework v.2.1 
+
+### Step by step instructions to benchmark baseline and Open Omics Acceleration Framework
 
-## Baseline ([rapids-single-cell-examples](https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-example))
+#### Baseline ([rapids-single-cell-examples](https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-example))
 
 EC2Instance: r7i.24xlarge
 
@@ -184,7 +209,7 @@ python -m ipykernel install --user --display-name "Python (rapidgenomics)"
 ```
 Note: Open Jupyter notebook and Select **1M_brain_cpu_analysis.ipynb** file and run all cells.
 
-## Open Omics Acceleration Framework single-cell-RNA-seq-analysis pipeline
+#### Open Omics Acceleration Framework single-cell-RNA-seq-analysis pipeline
 
 EC2Instance: r7i.24xlarge and c7i.24xlarge