-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container for (spack-manager) CUDA GPU Build of Exawind for NERSC Science Platform #575
Open
ajpowelsnl
wants to merge
27
commits into
sandialabs:main
Choose a base branch
from
ajpowelsnl:gpucontainer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
ceb45ee
Set of files to be modified for containergpucuda
ajpowelsnl 9b8a663
Dockerfile-containergpucuda: basic instance
ajpowelsnl 163c584
First draft of GPU container (Perlmutter)
ajpowelsnl 72d7330
create-exawind-snapshot.sh: clean up, add `-d` to gpucontainer
ajpowelsnl 51e356d
exawind_containergpucuda.yaml: cuda container build
ajpowelsnl b26cfdb
GPU configs udpates
ajpowelsnl e87dd67
create-exawind-snapshot.sh: rm NUM_CORES
ajpowelsnl 6dea598
Restore num jobs, NUM_CORES=8
ajpowelsnl e3f07ad
Apply yaksa-cuda.patch in spack repo
ajpowelsnl 5ba6683
cuda_arch=70 for lassen
ef9592a
update dependencies' build
ajpowelsnl ab0e26c
Adjust base image to cuda-11.8.0
ajpowelsnl b9ddae4
Dockerfile-containergpucuda: rm nccl
ajpowelsnl 7bfbbcd
Spack patch for yaksa
ajpowelsnl 77bf1fd
Spack yaska patch: from wyphan
ajpowelsnl 6b1d70e
rm incorrect yaksa patch
ajpowelsnl bfcfc49
spack: yaksa patch do-over w/ wyphan
ajpowelsnl 6f067a0
Rescue plan for yaska patch
ajpowelsnl f9b6f5d
More rescue: reset spack to ee68baf254ce8f401704ef1a62b77057487d4a12
ajpowelsnl 5c46ea2
Add sha (6c1868f8ae) with yaska-0.3
ajpowelsnl c001464
Patched Spack on fork/branch
ajpowelsnl 981a5d4
spack/patch_yaksa: branch spack from develop
ajpowelsnl 5f079fd
Building GPU container, but failing (CUDA) runtime
ajpowelsnl ecd75fd
add ubuntu mpich; Spack-pinned version problematic
ajpowelsnl c4f2029
fix up external mpich bloc
ajpowelsnl 76c7ee0
Dockerfile-containergpucuda: container file for PR
ajpowelsnl 6d4cfd0
Merge branch 'main' into gpucontainer
psakievich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
compilers: | ||
- compiler: | ||
spec: [email protected] | ||
paths: | ||
cc: /usr/bin/gcc | ||
cxx: /usr/bin/g++ | ||
f77: /usr/bin/gfortran | ||
fc: /usr/bin/gfortran | ||
flags: {} | ||
operating_system: ubuntu22.04 | ||
target: any | ||
modules: [] | ||
extra_rpaths: [] | ||
|
||
# See example: https://spack.readthedocs.io/en/latest/gpu_configuration.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
config: | ||
build_jobs: 20 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
packages: | ||
# Global settings | ||
all: | ||
compiler: | ||
- [email protected] | ||
providers: | ||
mpi: [mpich] | ||
blas: [netlib-lapack] | ||
lapack: [netlib-lapack] | ||
variants: build_type=Release cuda_arch=80 | ||
mpi: | ||
require: mpich | ||
# GPU-aware MPICH; See - https://spack.readthedocs.io/en/latest/build_settings.html#package-settings-packages-yaml | ||
# mpich: | ||
# require: "+cuda" | ||
# Use Ubuntu libncurses-dev, etc., b/c Spack version fails | ||
# Spack-pinned version of mpich builds fail | ||
mpich: | ||
externals: | ||
- spec: [email protected] | ||
require: "+cuda" | ||
prefix: /usr | ||
ncurses: | ||
externals: | ||
- spec: [email protected] | ||
prefix: /usr | ||
gdbm: | ||
externals: | ||
- spec: [email protected] | ||
prefix: /usr | ||
gdbm6: | ||
externals: | ||
- spec: [email protected] | ||
prefix: /usr | ||
# Package preferences to be built by Spack for correct Exawind | ||
# Nota bene: use libtool from Spack for correct linking | ||
ascent: | ||
variants: ~fortran~openmp | ||
amr-wind: | ||
variants: +tiny_profile | ||
conduit: | ||
variants: ~fortran~hdf5_compat | ||
boost: | ||
version: [1.78.0] | ||
variants: cxxstd=17 | ||
cmake: | ||
version: [3.26.3] | ||
variants: build_type=Release | ||
trilinos: | ||
require: | ||
- any_of: ["@13.4.0", "@develop"] | ||
hdf5: | ||
version: [1.10.7] | ||
variants: +cxx+hl | ||
libtool: | ||
version: [2.4.7] | ||
masa: | ||
variants: ~fortran~python | ||
netcdf-c: | ||
require: '@4.7.4' | ||
variants: +parallel-netcdf maxdims=65536 maxvars=524288 | ||
openfast: | ||
version: [master] | ||
variants: +cxx | ||
parallel-netcdf: | ||
version: [1.12.2] | ||
variants: ~fortran | ||
perl: | ||
require: '@5.34.1' | ||
tioga: | ||
version: [develop] | ||
hypre: | ||
require: '@develop' | ||
variants: ~fortran | ||
hypre2: | ||
require: '@develop' | ||
variants: ~fortran | ||
yaml-cpp: | ||
version: [0.6.3] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
spack: | ||
include: | ||
- include.yaml | ||
concretizer: | ||
unify: false | ||
reuse: false | ||
view: false | ||
specs: | ||
- 'exawind+hypre+amr_wind_gpu+nalu_wind_gpu+cuda' |
132 changes: 132 additions & 0 deletions
132
hpc_containers/exawind_container_gpucuda/Dockerfile-containergpucuda
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
LABEL maintainer="Philip Sakievich, Sandia National Laboratories <[email protected]>" | ||
|
||
# NVIDIA base images: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags | ||
|
||
ARG REGISTRY=nvcr.io/nvidia | ||
ARG IMAGE=cuda | ||
ARG TAG=11.8.0-devel-ubuntu22.04 | ||
#ARG TAG=12.2.0-devel-ubuntu22.04 | ||
|
||
FROM ${REGISTRY}/${IMAGE}:${TAG} | ||
|
||
# Make bash the default $SHELL | ||
SHELL ["/bin/bash", "-c"] | ||
|
||
# Install Spack Prereqs: https://spack.readthedocs.io/en/latest/getting_started.html#system-prerequisites | ||
|
||
RUN apt-get update -yqq && \ | ||
apt-get upgrade -yqq | ||
|
||
RUN apt-get install -yqq \ | ||
autoconf \ | ||
automake \ | ||
bzip2 \ | ||
ca-certificates \ | ||
clangd \ | ||
coreutils \ | ||
curl \ | ||
emacs-nox \ | ||
file \ | ||
flex \ | ||
gcc \ | ||
gcc-multilib \ | ||
gcc-doc \ | ||
g++ \ | ||
gfortran \ | ||
gfortran-multilib \ | ||
gfortran-doc \ | ||
git \ | ||
git-doc \ | ||
git-man \ | ||
gnupg2 \ | ||
hwloc-nox \ | ||
libbz2-dev \ | ||
libffi-dev \ | ||
libfmt-dev \ | ||
libgdbm-dev \ | ||
libgdbm6 \ | ||
libgmp-dev \ | ||
libhwloc-common \ | ||
libhwloc-dev \ | ||
libhwloc15 \ | ||
libjpeg-dev \ | ||
libmpc-dev \ | ||
libncurses-dev \ | ||
libtool \ | ||
libtool-bin \ | ||
libtool-doc \ | ||
libx11-dev \ | ||
lsb-release \ | ||
m4 \ | ||
make \ | ||
mpich \ | ||
mpich-doc \ | ||
nano \ | ||
python3 \ | ||
python3-distutils \ | ||
python3-venv \ | ||
unzip \ | ||
vim \ | ||
wget \ | ||
wget2 \ | ||
xz-utils \ | ||
zip \ | ||
zlib1g-dev | ||
|
||
RUN apt clean -y | ||
|
||
# Exawind GPU snapshot | ||
WORKDIR /exawind-entry | ||
# | ||
#RUN git clone --recursive https://github.com/sandialabs/spack-manager | ||
# Pre-merge fork | ||
RUN git clone --recursive https://github.com/ajpowelsnl/spack-manager | ||
# Needed by "create-exawind-snapshot.sh" | ||
ENV SPACK_MANAGER_MACHINE=containergpucuda | ||
ENV CONTAINER_BUILD=gpucuda | ||
ENV SPACK_MANAGER=/exawind-entry/spack-manager | ||
|
||
WORKDIR /exawind-entry/spack-manager | ||
|
||
# Nota bene: commented code is needed, but does not work in container env | ||
# Pre-merge branch from ajpowelsnl/spack-manager fork | ||
# RUN git checkout gpucontainer | ||
|
||
# Temp. code: Use branch of Spack w/ patch | ||
# DOESN'T BUILD CORRECTLY | ||
#RUN cd spack | ||
#RUN git remote add amy_fork https://github.com/ajpowelsnl/spack.git | ||
#RUN git fetch amy_fork | ||
#RUN git checkout amy_fork/spack/patch_yaksa | ||
|
||
|
||
# Snapshot will be generated upon running container | ||
RUN echo "pwd" >> /etc/bash.bashrc && \ | ||
echo "cd spack" >> /etc/bash.bashrc && \ | ||
echo "git remote add amy_fork https://github.com/ajpowelsnl/spack.git" >> /etc/bash.bashrc && \ | ||
echo "git fetch amy_fork" >> /etc/bash.bashrc && \ | ||
echo "git checkout amy_fork/spack/patch_yaksa" >> /etc/bash.bashrc && \ | ||
echo "cd .." >> /etc/bash.bashrc && \ | ||
echo "pwd" >> /etc/bash.bashrc && \ | ||
echo "git checkout gpucontainer" >> /etc/bash.bashrc && \ | ||
echo "export SPACK_MANAGER=$SPACK_MANAGER" >> /etc/bash.bashrc && \ | ||
echo "source $SPACK_MANAGER/start.sh && spack-start" >> /etc/bash.bashrc && \ | ||
echo "spack external find --all" >> /etc/bash.bashrc && \ | ||
echo "$SPACK_MANAGER/scripts/create-exawind-snapshot.sh" >> /etc/bash.bashrc && \ | ||
echo "spack clean --all" >> /etc/bash.bashrc && \ | ||
echo "spack env activate -d snapshots/exawind/containergpucuda/$(date +%Y-%m-%d)" >> /etc/bash.bashrc && \ | ||
echo "spack load exawind" >> /etc/bash.bashrc | ||
|
||
# Verify .bashrc | ||
# RUN ["/bin/bash", "-c", "tail -n 6 /etc/bash.bashrc"] | ||
|
||
# Verify executable: | ||
# 66 spack env activate -d snapshots/exawind/containergpucuda/2023-11-01/ | ||
# 67 spack load exawind | ||
# 68 which exawind | ||
# 69 exawind --help | ||
|
||
|
||
|
||
#WORKDIR /exawind-entry | ||
CMD [ "/bin/bash" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule spack
updated
2 files
+2 −0 | var/spack/repos/builtin/packages/yaksa/package.py | |
+27 −0 | var/spack/repos/builtin/packages/yaksa/yaksa-cuda-libtool.patch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,6 +97,9 @@ def is_e4s(): | |
"perlmutter": MachineData( | ||
lambda: os.environ["NERSC_HOST"] == "perlmutter", "perlmutter-p1.nersc.gov" | ||
), | ||
"containergpucuda": MachineData( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I like this name. Would we expect this to build any container using cuda, or specifically containers on perlmutter? I would prefer to start with a more precise name and relax it rather than vice-versa. |
||
lambda: os.environ["CONTAINER_BUILD"] == "gpucuda", "containgpucuda.nodomain.gov" | ||
), | ||
# General | ||
"darwin": MachineData(lambda: sys.platform == "darwin", "darwin.nodomain.gov"), | ||
} | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it was an update to the submodule file, and not the spack commit? Is that right? We have a mirror only policy on spack changes so these changes would need to go into mainline spack.