Skip to content

Commit

Permalink
feat: NV NIM Drug discovery blue print support for L4 (#981)
Browse files Browse the repository at this point in the history
  • Loading branch information
sujituk authored Feb 19, 2025
1 parent 706cbd8 commit 5f256ee
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 25 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,20 @@ This guide outlines the steps to deploy NVIDIA's NIM blueprint for [Generative V
* **Project:** A Google Cloud project with billing enabled.
* **NGC API Key:** An API key from NVIDIA NGC.
* **kubectl:** kubectl command-line tool installed and configured.
* **NVIDIA GPUs:** NVIDIA A100 80GB(3) GPU preferred in the same region / zone.
* **NVIDIA GPUs:** One of the below GPUs should work
* [NVIDIA L4 GPU (4)](https://cloud.google.com/compute/docs/gpus#l4-gpus)
* [NVIDIA A100 80GB (1) GPU](https://cloud.google.com/compute/docs/gpus#a100-gpus)
* [NVIDIA H100 80GB (1) GPU or higher](https://cloud.google.com/compute/docs/gpus#a3-series)

Clone the repo before proceeding further:

```bash
```bash

git clone https://github.com/GoogleCloudPlatform/ai-on-gke

git clone https://github.com/GoogleCloudPlatform/ai-on-gke
cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery

```
```

## Deployment Steps

Expand All @@ -30,10 +34,10 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
export CLUSTER_NAME="gke-nimbp-genscr"
export NODE_POOL_NAME="gke-nimbp-genscr-np"
export ZONE="<GCP zone>" #us-east5-b
export MACHINE_TYPE="<GCP machine type>" #"a2-ultragpu-1g"
export ACCELERATOR_TYPE="<GPU Type>" #"nvidia-a100-80gb"
export ACCELERATOR_COUNT="1"
export NODE_POOL_NODES=3
export MACHINE_TYPE="<GCP machine type>" # e.g., g2-standard-48 (L4) or a2-ultragpu-1g (A100 80GB)
export ACCELERATOR_TYPE="<GPU Type>" # e.g., nvidia-l4 (L4) or nvidia-a100-80gb (A100 80GB)
export ACCELERATOR_COUNT="1" # e.g., 4 (L4) or 1 (A100 80GB)
export NODE_POOL_NODES=3 # e.g., 1 (L4) or 3 (A100 80GB)
export NGC_API_KEY="<NGC API Key>"

```
Expand All @@ -50,7 +54,7 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
```

3. **Create GPU Node Pool:** This creates a node pool with GPU machines optimized for BioNeMo workloads.
3. **Create GPU Node Pool:** This creates a node pool with GPU machines optimized for running the NIMs.

```bash
Expand Down Expand Up @@ -97,7 +101,7 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
```

7. **Deploy BioNeMo Services:** Deploy AlphaFold2, MolMIM, and DiffDock.
7. **Deploy Blueprint:** Deploy AlphaFold2, MolMIM, and DiffDock.

```bash
Expand All @@ -109,12 +113,12 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery

```bash
k create -f nim-bionemo-generative-virtual-screening.yaml
k create -f nim-bp-generative-virtual-screening.yaml
```

> NOTE:
> The AlphaFold2 NIM requires downloading a large dataset and related files, and then loading the model for inference. This process can take typically between 1.5 hrs and 2 hours. Please be patient while the deployment completes.
> The AlphaFold2 NIM requires downloading supporting data from NGC. This process can take typically around 2-3 hours. Alternatively, you could copy the downloaded data into a persistent disk like NFS or Google Cloud storage bucket and use for future inference in few minutes. [Steps](#cache-alphafold2-data) outlined below.

You can check the pods are in `Running` status: `k get pods` should list 3 pods: `alphafold2-`, `diffdock-` and `molmim-`.

Expand All @@ -131,14 +135,14 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery

```bash
POD_BIONEMO_ALPHAFOLD=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep '^alphafold2')
k port-forward pod/$POD_BIONEMO_ALPHAFOLD 8010:8000 &
POD_NIM_ALPHAFOLD=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep '^alphafold2')
k port-forward pod/$POD_NIM_ALPHAFOLD 8010:8000 &
POD_BIONEMO_MOLMIM=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep '^molmim')
k port-forward pod/$POD_BIONEMO_MOLMIM 8011:8000 &
POD_NIM_MOLMIM=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep '^molmim')
k port-forward pod/$POD_NIM_MOLMIM 8011:8000 &
POD_BIONEMO_DIFFDOCK=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep '^diffdock')
k port-forward pod/$POD_BIONEMO_DIFFDOCK 8012:8000 &
POD_NIM_DIFFDOCK=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep '^diffdock')
k port-forward pod/$POD_NIM_DIFFDOCK 8012:8000 &
```

Expand Down Expand Up @@ -180,12 +184,12 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
```

10. **Test end to end:** The test for protein folding, molecule generation and protein docking might take about 5-8 mins to run
10. **Test end to end:** The test for protein folding, molecule generation and protein docking might take about 5-8 mins to run.

> NOTE:
> If the port numbers were changed earlier, then update the `AF2_HOST`, `MOLMIM_HOST`, `DIFFDOCK_HOST` variables with port numbers in `test-generative-virtual-screening.py` file.
> NOTE:
> If the port numbers were changed earlier, then update the `AF2_HOST`, `MOLMIM_HOST`, `DIFFDOCK_HOST` variables with port numbers in `test-generative-virtual-screening.py` file.

```bash
```bash
python3 -m venv venv
Expand All @@ -196,7 +200,7 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
deactivate
```
```

## Cleanup

Expand All @@ -206,8 +210,22 @@ cd ai-on-gke/tutorials-and-examples/nvidia-nim/blueprints/drugdiscovery
k delete secret secret-nvcr
k delete secret ngc-api-key
k delete -f nim-bionemo-generative-virtual-screening.yaml
k delete -f nim-bp-generative-virtual-screening.yaml
k delete -f nim-storage-filestore.yaml
gcloud container clusters delete "${CLUSTER_NAME}" --location="${ZONE}"
```

## Cache AlphaFold2 data

1. Create a GCS bucket `gs://nim-bp-alphafold2-cache-{project-number}`

2. Copy contents (`/data/ngc/hub/models--nim--deepmind--alphafold2-data`) to the bucket `gs://nim-bp-alphafold2-cache/ngc/hub/models--nim--deepmind--alphafold2-data/`

3. For future inference, copy the contents into the mounted NFS volume before deploying the blueprint (step 7). If the destination page is changed, you need to update the volume mount path (nim-bp-generative-virtual-screening.yaml) in `alphafold2` containers.

```bash
gcloud storage cp -r gs://nim-bp-alphafold2-cache/ngc/hub/models--nim--deepmind--alphafold2-data/ /data/ngc/hub/
```

0 comments on commit 5f256ee

Please sign in to comment.