Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container for (spack-manager) CUDA GPU Build of Exawind for NERSC Science Platform #575

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ceb45ee
Set of files to be modified for containergpucuda
ajpowelsnl Oct 11, 2023
9b8a663
Dockerfile-containergpucuda: basic instance
ajpowelsnl Oct 17, 2023
163c584
First draft of GPU container (Perlmutter)
ajpowelsnl Oct 18, 2023
72d7330
create-exawind-snapshot.sh: clean up, add `-d` to gpucontainer
ajpowelsnl Oct 19, 2023
51e356d
exawind_containergpucuda.yaml: cuda container build
ajpowelsnl Oct 19, 2023
b26cfdb
GPU configs udpates
ajpowelsnl Oct 19, 2023
e87dd67
create-exawind-snapshot.sh: rm NUM_CORES
ajpowelsnl Oct 19, 2023
6dea598
Restore num jobs, NUM_CORES=8
ajpowelsnl Oct 19, 2023
e3f07ad
Apply yaksa-cuda.patch in spack repo
ajpowelsnl Oct 20, 2023
5ba6683
cuda_arch=70 for lassen
Oct 20, 2023
ef9592a
update dependencies' build
ajpowelsnl Oct 30, 2023
ab0e26c
Adjust base image to cuda-11.8.0
ajpowelsnl Oct 30, 2023
b9ddae4
Dockerfile-containergpucuda: rm nccl
ajpowelsnl Oct 30, 2023
7bfbbcd
Spack patch for yaksa
ajpowelsnl Oct 31, 2023
77bf1fd
Spack yaska patch: from wyphan
ajpowelsnl Oct 31, 2023
6b1d70e
rm incorrect yaksa patch
ajpowelsnl Oct 31, 2023
bfcfc49
spack: yaksa patch do-over w/ wyphan
ajpowelsnl Oct 31, 2023
6f067a0
Rescue plan for yaska patch
ajpowelsnl Oct 31, 2023
f9b6f5d
More rescue: reset spack to ee68baf254ce8f401704ef1a62b77057487d4a12
ajpowelsnl Oct 31, 2023
5c46ea2
Add sha (6c1868f8ae) with yaska-0.3
ajpowelsnl Oct 31, 2023
c001464
Patched Spack on fork/branch
ajpowelsnl Nov 1, 2023
981a5d4
spack/patch_yaksa: branch spack from develop
ajpowelsnl Nov 1, 2023
5f079fd
Building GPU container, but failing (CUDA) runtime
ajpowelsnl Nov 2, 2023
ecd75fd
add ubuntu mpich; Spack-pinned version problematic
ajpowelsnl Nov 3, 2023
c4f2029
fix up external mpich bloc
ajpowelsnl Nov 3, 2023
76c7ee0
Dockerfile-containergpucuda: container file for PR
ajpowelsnl Nov 3, 2023
6d4cfd0
Merge branch 'main' into gpucontainer
psakievich Jan 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions configs/containergpucuda/compilers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
compilers:
- compiler:
spec: [email protected]
paths:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
f77: /usr/bin/gfortran
fc: /usr/bin/gfortran
flags: {}
operating_system: ubuntu22.04
target: any
modules: []
extra_rpaths: []

# See example: https://spack.readthedocs.io/en/latest/gpu_configuration.html
2 changes: 2 additions & 0 deletions configs/containergpucuda/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
config:
build_jobs: 20
79 changes: 79 additions & 0 deletions configs/containergpucuda/packages.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
packages:
# Global settings
all:
compiler:
- [email protected]
providers:
mpi: [mpich]
blas: [netlib-lapack]
lapack: [netlib-lapack]
variants: build_type=Release cuda_arch=80
mpi:
require: mpich
# GPU-aware MPICH; See - https://spack.readthedocs.io/en/latest/build_settings.html#package-settings-packages-yaml
# mpich:
# require: "+cuda"
# Use Ubuntu libncurses-dev, etc., b/c Spack version fails
# Spack-pinned version of mpich builds fail
mpich:
externals:
- spec: [email protected]
require: "+cuda"
prefix: /usr
ncurses:
externals:
- spec: [email protected]
prefix: /usr
gdbm:
externals:
- spec: [email protected]
prefix: /usr
gdbm6:
externals:
- spec: [email protected]
prefix: /usr
# Package preferences to be built by Spack for correct Exawind
# Nota bene: use libtool from Spack for correct linking
ascent:
variants: ~fortran~openmp
amr-wind:
variants: +tiny_profile
conduit:
variants: ~fortran~hdf5_compat
boost:
version: [1.78.0]
variants: cxxstd=17
cmake:
version: [3.26.3]
variants: build_type=Release
trilinos:
require:
- any_of: ["@13.4.0", "@develop"]
hdf5:
version: [1.10.7]
variants: +cxx+hl
libtool:
version: [2.4.7]
masa:
variants: ~fortran~python
netcdf-c:
require: '@4.7.4'
variants: +parallel-netcdf maxdims=65536 maxvars=524288
openfast:
version: [master]
variants: +cxx
parallel-netcdf:
version: [1.12.2]
variants: ~fortran
perl:
require: '@5.34.1'
tioga:
version: [develop]
hypre:
require: '@develop'
variants: ~fortran
hypre2:
require: '@develop'
variants: ~fortran
yaml-cpp:
version: [0.6.3]
9 changes: 9 additions & 0 deletions env-templates/exawind_containergpucuda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
spack:
include:
- include.yaml
concretizer:
unify: false
reuse: false
view: false
specs:
- 'exawind+hypre+amr_wind_gpu+nalu_wind_gpu+cuda'
132 changes: 132 additions & 0 deletions hpc_containers/exawind_container_gpucuda/Dockerfile-containergpucuda
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
LABEL maintainer="Philip Sakievich, Sandia National Laboratories <[email protected]>"

# NVIDIA base images: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags

ARG REGISTRY=nvcr.io/nvidia
ARG IMAGE=cuda
ARG TAG=11.8.0-devel-ubuntu22.04
#ARG TAG=12.2.0-devel-ubuntu22.04

FROM ${REGISTRY}/${IMAGE}:${TAG}

# Make bash the default $SHELL
SHELL ["/bin/bash", "-c"]

# Install Spack Prereqs: https://spack.readthedocs.io/en/latest/getting_started.html#system-prerequisites

RUN apt-get update -yqq && \
apt-get upgrade -yqq

RUN apt-get install -yqq \
autoconf \
automake \
bzip2 \
ca-certificates \
clangd \
coreutils \
curl \
emacs-nox \
file \
flex \
gcc \
gcc-multilib \
gcc-doc \
g++ \
gfortran \
gfortran-multilib \
gfortran-doc \
git \
git-doc \
git-man \
gnupg2 \
hwloc-nox \
libbz2-dev \
libffi-dev \
libfmt-dev \
libgdbm-dev \
libgdbm6 \
libgmp-dev \
libhwloc-common \
libhwloc-dev \
libhwloc15 \
libjpeg-dev \
libmpc-dev \
libncurses-dev \
libtool \
libtool-bin \
libtool-doc \
libx11-dev \
lsb-release \
m4 \
make \
mpich \
mpich-doc \
nano \
python3 \
python3-distutils \
python3-venv \
unzip \
vim \
wget \
wget2 \
xz-utils \
zip \
zlib1g-dev

RUN apt clean -y

# Exawind GPU snapshot
WORKDIR /exawind-entry
#
#RUN git clone --recursive https://github.com/sandialabs/spack-manager
# Pre-merge fork
RUN git clone --recursive https://github.com/ajpowelsnl/spack-manager
# Needed by "create-exawind-snapshot.sh"
ENV SPACK_MANAGER_MACHINE=containergpucuda
ENV CONTAINER_BUILD=gpucuda
ENV SPACK_MANAGER=/exawind-entry/spack-manager

WORKDIR /exawind-entry/spack-manager

# Nota bene: commented code is needed, but does not work in container env
# Pre-merge branch from ajpowelsnl/spack-manager fork
# RUN git checkout gpucontainer

# Temp. code: Use branch of Spack w/ patch
# DOESN'T BUILD CORRECTLY
#RUN cd spack
#RUN git remote add amy_fork https://github.com/ajpowelsnl/spack.git
#RUN git fetch amy_fork
#RUN git checkout amy_fork/spack/patch_yaksa


# Snapshot will be generated upon running container
RUN echo "pwd" >> /etc/bash.bashrc && \
echo "cd spack" >> /etc/bash.bashrc && \
echo "git remote add amy_fork https://github.com/ajpowelsnl/spack.git" >> /etc/bash.bashrc && \
echo "git fetch amy_fork" >> /etc/bash.bashrc && \
echo "git checkout amy_fork/spack/patch_yaksa" >> /etc/bash.bashrc && \
echo "cd .." >> /etc/bash.bashrc && \
echo "pwd" >> /etc/bash.bashrc && \
echo "git checkout gpucontainer" >> /etc/bash.bashrc && \
echo "export SPACK_MANAGER=$SPACK_MANAGER" >> /etc/bash.bashrc && \
echo "source $SPACK_MANAGER/start.sh && spack-start" >> /etc/bash.bashrc && \
echo "spack external find --all" >> /etc/bash.bashrc && \
echo "$SPACK_MANAGER/scripts/create-exawind-snapshot.sh" >> /etc/bash.bashrc && \
echo "spack clean --all" >> /etc/bash.bashrc && \
echo "spack env activate -d snapshots/exawind/containergpucuda/$(date +%Y-%m-%d)" >> /etc/bash.bashrc && \
echo "spack load exawind" >> /etc/bash.bashrc

# Verify .bashrc
# RUN ["/bin/bash", "-c", "tail -n 6 /etc/bash.bashrc"]

# Verify executable:
# 66 spack env activate -d snapshots/exawind/containergpucuda/2023-11-01/
# 67 spack load exawind
# 68 which exawind
# 69 exawind --help



#WORKDIR /exawind-entry
CMD [ "/bin/bash" ]
5 changes: 4 additions & 1 deletion scripts/create-exawind-snapshot.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash -l
#!/bin/bash
#
# Copyright (c) 2022, National Technology & Engineering Solutions of Sandia,
# LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
Expand Down Expand Up @@ -52,6 +52,9 @@ elif [[ "${SPACK_MANAGER_MACHINE}" == "summit" ]]; then
elif [[ "${SPACK_MANAGER_MACHINE}" == "perlmutter" ]]; then
NUM_CORES=8
cmd "nice -n19 spack manager snapshot -m -s exawind%gcc+hypre+cuda+amr_wind_gpu+nalu_wind_gpu"
elif [[ "${SPACK_MANAGER_MACHINE}" == "containergpucuda" ]]; then
cmd "nice -n19 spack -d manager snapshot -m -s exawind%gcc+hypre+cuda+amr_wind_gpu+nalu_wind_gpu"
NUM_CORES=8
elif [[ "${SPACK_MANAGER_MACHINE}" == "snl-hpc" ]]; then
# TODO we should probably launch the install through slurm and exit on this one
cmd "nice -n19 spack manager snapshot -s exawind+hypre+openfast amr-wind+hypre+openfast"
Expand Down
2 changes: 1 addition & 1 deletion spack
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was an update to the submodule file, and not the spack commit? Is that right? We have a mirror only policy on spack changes so these changes would need to go into mainline spack.

3 changes: 3 additions & 0 deletions spack-scripting/scripting/cmd/manager_cmds/find_machine.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ def is_e4s():
"perlmutter": MachineData(
lambda: os.environ["NERSC_HOST"] == "perlmutter", "perlmutter-p1.nersc.gov"
),
"containergpucuda": MachineData(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like this name. Would we expect this to build any container using cuda, or specifically containers on perlmutter? I would prefer to start with a more precise name and relax it rather than vice-versa.

lambda: os.environ["CONTAINER_BUILD"] == "gpucuda", "containgpucuda.nodomain.gov"
),
# General
"darwin": MachineData(lambda: sys.platform == "darwin", "darwin.nodomain.gov"),
}
Expand Down
Loading