Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to GPU Instances for GSLC Jobs #2332

Draft
wants to merge 15 commits into
base: develop
Choose a base branch
from
Draft
2 changes: 1 addition & 1 deletion .github/workflows/deploy-enterprise-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
job_files: >-
job_spec/INSAR_ISCE_BURST.yml
job_spec/SRG_GSLC_CPU.yml
instance_types: r6id.xlarge,r6id.2xlarge,r6id.4xlarge,r6id.8xlarge,r6idn.xlarge,r6idn.2xlarge,r6idn.4xlarge,r6idn.8xlarge
instance_types: g6.2xlarge,g6.4xlarge,g4dn.2xlarge,g4dn.4xlarge
jtherrmann marked this conversation as resolved.
Show resolved Hide resolved
default_max_vcpus: 640
expanded_max_vcpus: 640
required_surplus: 0
Expand Down
22 changes: 22 additions & 0 deletions apps/compute-cf.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,28 @@ Resources:
cloud-init-per instance mkfs_ssd mkfs.ext4 /dev/nvme1n1
mount /dev/nvme1n1 /var/lib/docker

DRIVER_VERSION=550.54.14
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) kernel-modules-extra
curl -fSsl -O https://us.download.nvidia.com/tesla/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
chmod +x NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run --tmpdir . --silent
rm ./NVIDIA-Linux-x86_64-$DRIVER_VERSION.run

dnf install -y docker git
systemctl start docker
systemctl enable docker
usermod -aG docker ec2-user

dnf config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
dnf install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

dnf install -y git

dnf clean all && rm -rf /var/cache/dnf/*
jtherrmann marked this conversation as resolved.
Show resolved Hide resolved

reboot
jtherrmann marked this conversation as resolved.
Show resolved Hide resolved
--==BOUNDARY==--

ComputeEnvironment:
Expand Down
2 changes: 2 additions & 0 deletions apps/workflow-cf.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ Resources:
ResourceRequirements:
- Type: VCPU
Value: "{{ task['vcpu'] }}"
- Type: GPU
Value: 1
- Type: MEMORY
Value: "{{ task['memory'] }}"
Command:
Expand Down
3 changes: 2 additions & 1 deletion job_spec/SRG_GSLC_CPU.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,11 @@ SRG_GSLC_CPU:
cost: 1.0
tasks:
- name: ''
image: ghcr.io/asfhyp3/hyp3-back-projection
image: ghcr.io/asfhyp3/hyp3-back-projection:0.5.2.gpu
jtherrmann marked this conversation as resolved.
Show resolved Hide resolved
command:
- ++process
- back_projection
- --gpu
- --bucket
- '!Ref Bucket'
- --bucket-prefix
Expand Down