Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin' into 16-setup-benchmarking-infr…
Browse files Browse the repository at this point in the history
…astructure
  • Loading branch information
lorenzovarese committed Sep 13, 2024
2 parents b1f539e + ac2f19b commit ac46bff
Show file tree
Hide file tree
Showing 12 changed files with 377 additions and 22 deletions.
99 changes: 77 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,85 @@

## Installation

### Setup python virtual environment


### Development installation

```bash
export GRIDTOOLS_JL_PATH="..."
export GT4PY_PATH="..."
# create python virtual environemnt
# make sure to use a python version that is compatible with GT4Py
python -m venv .venv
# activate virtual env
# this command has be run everytime GridTools.jl is used
source .venv/bin/activate
# clone gt4py
git clone --branch fix_python_interp_path_in_cmake [email protected]:tehrengruber/gt4py.git
#git clone [email protected]:GridTools/gt4py.git $GT4PY_PATH
pip install -r $GT4PY_PATH/requirements-dev.txt
pip install -e $GT4PY_PATH
#
```
### Development Installation

As of August 2024, the recommended Python version for development is **3.10.14**.

**Important Note:** The Python virtual environment must be created in the directory specified by `GRIDTOOLS_JL_PATH/.venv`. Creating the environment in any other location will result in errors.

#### Steps to Set Up the Development Environment

1. **Set Environment Variables:**
Set the environment variables for `GRIDTOOLS_JL_PATH` and `GT4PY_PATH`. Replace `...` with the appropriate paths on your system.

```bash
export GRIDTOOLS_JL_PATH="..."
export GT4PY_PATH="..."
```

2. **Create a Python Virtual Environment:**
Navigate to the `GRIDTOOLS_JL_PATH` directory and create a Python virtual environment named `.venv`. Ensure you are using a compatible Python version (i.e. 3.10.14).

```bash
cd $GRIDTOOLS_JL_PATH
python3.10 -m venv .venv
```

3. **Activate the Virtual Environment:**
Activate the virtual environment. You need to run this command every time you work with GridTools.jl.

```bash
source .venv/bin/activate
```

4. **Clone the GT4Py Repository:**
Clone the GT4Py repository. You can use the specific branch mentioned or the main repository as needed.

```bash
git clone --branch fix_python_interp_path_in_cmake [email protected]:tehrengruber/gt4py.git
# Alternatively, you can clone the main repository:
# git clone [email protected]:GridTools/gt4py.git $GT4PY_PATH
```

5. **Install Required Packages:**
Install the development requirements and the GT4Py package in editable mode.

```bash
pip install -r $GT4PY_PATH/requirements-dev.txt
pip install -e $GT4PY_PATH
```

6. **Build PyCall:**
With the virtual environment activated, run Julia form the `GridTools.jl` folder with the command `julia --project=.` and then build using the following commands:

```julia
using Pkg
Pkg.build()
```

## Troubleshooting

### Common Build Errors

__undefined symbol: PyObject_Vectorcall__
- Make sure to run everything in the same environment that you built `PyCall` with. A common reason for this error is that PyCall was built in a virtual environment and then was not loaded when executing stencils.

__CMake Error: Could NOT find Boost__
- GridTools.jl requires the Boost library version 1.65.1 or higher. If Boost is not installed, you can install it via your system's package manager. For example, on Ubuntu, use:
```bash
sudo apt-get install libboost-all-dev
```
Make sure the installed version meets the minimum required version of 1.65.1. If CMake still cannot find Boost after installation, you may need to manually specify the Boost installation path in the CMake command using the `-DBOOST_ROOT=/path/to/boost` option, where `/path/to/boost` is the directory where Boost is installed.

__Supporting GPU Backend with CUDA__

- To enable GPU acceleration and utilize the GPU backend features of this project, it is essential to have the NVIDIA CUDA Toolkit installed. CUDA provides the necessary compiler (nvcc) and libraries for developing and running applications that leverage NVIDIA GPUs.

Make sure to run everything in the same environment that you have build `PyCall` with. A common reason is you have built PyCall in a virtual environement and then didn't load it when executing stencils.
- If the `LD_LIBRARY_PATH` environment variable is set in your current environment, it is recommended to unset it. This avoids conflicts between the paths managed by CUDA.jl and those already present on the system.
```julia
julia> using CUDA
┌ Warning: CUDA runtime library `...` was loaded from a system path, `/usr/local/cuda/lib64/...`.
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
```
89 changes: 89 additions & 0 deletions ci/cscs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
stages:
- build_base_stage0_image
- build_base_stage1_image
- build_base_stage2_image
- build_image
- ci_jobs

variables:
GPU_ENABLED: true
CUDA_DRIVER_VERSION: "470.57.02"
PROJECT_NAME: gridtools_jl
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/pasc_kilos/${CONTAINER_RUNNER}/${PROJECT_NAME}_image:$CI_COMMIT_SHORT_SHA
CPU_ARCH: "x86_64_v3" # use a generic architecture here instead of linux-sles15-haswell, such that it can build on zen2

include:
- remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'

.gt-container-builder:
extends: .container-builder
timeout: 2h
before_script:
- DOCKER_TAG=`eval cat $WATCH_FILECHANGES | sha256sum | head -c 16`
- |
if [[ "$CI_COMMIT_MESSAGE" =~ "Trigger container rebuild $ENV_VAR_NAME" ]]; then
echo "Rebuild triggered."
export CSCS_REBUILD_POLICY="always"
fi
- export PERSIST_IMAGE_NAME=$PERSIST_IMAGE_NAME:$DOCKER_TAG
- echo "$ENV_VAR_NAME=$PERSIST_IMAGE_NAME" > build.env
artifacts:
reports:
dotenv: build.env
variables:
# the variables below MUST be set to a sane value. They are mentioned here, to see
# which variables should be set.
DOCKERFILE: ci/docker/Dockerfile.base # overwrite with the real path of the Dockerfile
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/base/my_base_image # Important: No version-tag
WATCH_FILECHANGES: 'ci/docker/Dockerfile.base "path/to/another/file with whitespaces.txt"'
ENV_VAR_NAME: BASE_IMAGE

build_base_stage0_image_job:
stage: build_base_stage0_image
extends: .gt-container-builder
variables:
DOCKERFILE: docker/base/Dockerfile
DOCKER_BUILD_ARGS: '["INSTALL_CUDA_DRIVER=$GPU_ENABLED", "CUDA_DRIVER_VERSION=$CUDA_DRIVER_VERSION", "CPU_ARCH=$CPU_ARCH"]'
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/gridtools/${CONTAINER_RUNNER}/gridtools_jl_base_image
WATCH_FILECHANGES: 'docker/base/Dockerfile'
ENV_VAR_NAME: BASE_IMAGE_STAGE0

build_base_stage1_image_job:
stage: build_base_stage1_image
extends: .gt-container-builder
variables:
DOCKERFILE: docker/base_spack_deps/Dockerfile
DOCKER_BUILD_ARGS: '["BASE_IMAGE=$BASE_IMAGE_STAGE0", "PROJECT_NAME=$PROJECT_NAME", "SPACK_ENV_FILE=spack-${CONTAINER_RUNNER}.yaml"]'
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/gridtools/${CONTAINER_RUNNER}/${PROJECT_NAME}_base_stage1_image
WATCH_FILECHANGES: 'docker/base/Dockerfile docker/base_spack_deps/Dockerfile docker/base_spack_deps/spack-daint-p100.yaml' # TODO: inherit from stage0
ENV_VAR_NAME: BASE_IMAGE_STAGE1

build_base_stage2_image_job:
stage: build_base_stage2_image
extends: .gt-container-builder
variables:
DOCKERFILE: docker/base_deps/Dockerfile
DOCKER_BUILD_ARGS: '["BASE_IMAGE=$BASE_IMAGE_STAGE1", "PROJECT_NAME=$PROJECT_NAME"]'
PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/gridtools/${CONTAINER_RUNNER}/${PROJECT_NAME}_base_stage2_image
WATCH_FILECHANGES: 'docker/base/Dockerfile docker/base_spack_deps/Dockerfile docker/base_spack_deps/spack-daint-p100.yaml docker/base_deps/Dockerfile' # TODO: inherit from stage1
ENV_VAR_NAME: BASE_IMAGE_STAGE2

build_image:
stage: build_image
extends: .container-builder
variables:
DOCKERFILE: docker/image/Dockerfile
DOCKER_BUILD_ARGS: '["BASE_IMAGE=$BASE_IMAGE_STAGE2", "PROJECT_NAME=$PROJECT_NAME"]'

run_tests:
stage: ci_jobs
image: $PERSIST_IMAGE_NAME
extends: .container-runner-daint
script:
- . /opt/gridtools_jl_env/setup-env.sh
- cd /opt/GridTools
- julia --project=. -e 'using Pkg; Pkg.test()'
variables:
SLURM_JOB_NUM_NODES: 1
SLURM_NTASKS: 1
SLURM_TIMELIMIT: "00:30:00"
48 changes: 48 additions & 0 deletions docker/base/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# just a counter to trigger rebuilds: 3
FROM ubuntu:23.04 as builder
ARG INSTALL_CUDA_DRIVER=false
ARG CUDA_DRIVER_VERSION
ARG CPU_ARCH

SHELL ["/bin/bash", "-c"]

RUN apt-get update \
&& env DEBIAN_FRONTEND=noninteractive TZ=Europe/Zurich apt-get -yqq install --no-install-recommends build-essential ca-certificates coreutils curl environment-modules file gfortran git git-lfs gpg gpg-agent lsb-release openssh-client python3 python3-distutils python3-venv unzip zip

RUN apt-get clean

WORKDIR /opt/gridtools_jl_env

COPY ./docker/base/install_cuda_driver.sh ./install_cuda_driver.sh
RUN if [ "x$INSTALL_CUDA_DRIVER" == "xtrue" ]; then ./install_cuda_driver.sh $CUDA_DRIVER_VERSION; fi

RUN git clone --depth 1 -c feature.manyFiles=true https://github.com/spack/spack.git

# In case the driver is not installed this fixes missing `-lcuda` errors when installing cupy.
#RUN git remote add origin_tehrengruber https://github.com/tehrengruber/spack.git
#RUN git fetch origin_tehrengruber
#RUN git checkout --track origin_tehrengruber/fix_libcuda_not_found

WORKDIR ./spack/bin

# careful: this overrides and will be overriden by other configuration to packages:all:require
RUN ./spack config add packages:all:require:target=$CPU_ARCH

RUN ./spack install gcc@11

# cleanup
RUN ./spack clean --all
RUN ./spack gc -y

# strip all the binaries
RUN find -L /opt/gridtools_jl_env/spack/opt -type f -exec readlink -f '{}' \; | \
xargs file -i | \
grep 'charset=binary' | \
grep 'x-executable\|x-archive\|x-sharedlib' | \
awk -F: '{print $1}' | xargs strip -x || true

WORKDIR /

# flatten image
FROM scratch
COPY --from=builder / /
21 changes: 21 additions & 0 deletions docker/base/install_cuda_driver.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
CUDA_DRIVER_VERSION=$1

echo "Installing CUDA driver version $CUDA_DRIVER_VERSION"
apt-get -yqq install --no-install-recommends kmod wget
wget -q https://us.download.nvidia.com/XFree86/Linux-x86_64/${CUDA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${CUDA_DRIVER_VERSION}.run
chmod +x NVIDIA-Linux-x86_64-${CUDA_DRIVER_VERSION}.run
./NVIDIA-Linux-x86_64-${CUDA_DRIVER_VERSION}.run -s -q -a \
--no-nvidia-modprobe \
--no-abi-note \
--no-kernel-module \
--no-distro-scripts \
--no-opengl-files \
--no-wine-files \
--no-kernel-module-source \
--no-unified-memory \
--no-drm \
--no-libglx-indirect \
--no-install-libglvnd \
--no-systemd
rm ./NVIDIA-Linux-x86_64-${CUDA_DRIVER_VERSION}.run
25 changes: 25 additions & 0 deletions docker/base_deps/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# rebuild counter 3 # just a counter to increase when we want a new image
ARG BASE_IMAGE=gridtools_jl_spack_deps_image
FROM $BASE_IMAGE as builder
ARG PROJECT_NAME

WORKDIR /opt/${PROJECT_NAME}_env

COPY ./docker/base_deps/setup-env.sh ./setup-env.sh
RUN sed -i "s/%PROJECT_NAME%/$PROJECT_NAME/g" setup-env.sh

WORKDIR /opt/
COPY ./docker/base_deps/install_gt4py.sh ./install_gt4py.sh
RUN . /opt/${PROJECT_NAME}_env/setup-env.sh; ./install_gt4py.sh
RUN . /opt/${PROJECT_NAME}_env/setup-env.sh; pip cache purge

WORKDIR /opt/gridtools_jl_deps
COPY ./Project.toml ./Project.toml
RUN mkdir src
COPY ./docker/base_deps/dummy_module.jl ./src/GridTools.jl
RUN . /opt/${PROJECT_NAME}_env/setup-env.sh; julia --project=. -e "using Pkg; Pkg.instantiate(); Pkg.build(); Pkg.precompile()"
RUN rm -rf /opt/gridtools_jl_deps

# flatten image
FROM scratch
COPY --from=builder / /
2 changes: 2 additions & 0 deletions docker/base_deps/dummy_module.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
module GridTools
end
4 changes: 4 additions & 0 deletions docker/base_deps/install_gt4py.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
git clone --branch fix_python_interp_path_in_cmake https://github.com/tehrengruber/gt4py.git
pip install -r ./gt4py/requirements-dev.txt
pip install ./gt4py
24 changes: 24 additions & 0 deletions docker/base_deps/setup-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash
# note: occurrences of %PROJECT_NAME% in this file are replaced when copied into the container
export HOME=/root

. /opt/%PROJECT_NAME%_env/spack/share/spack/setup-env.sh

# gcc is installed outside the env so load it before. In case gcc is not loaded we might run
# into strange errors where partially the spack version and partially the system installed version
# is used.
spack load gcc

spack env activate %PROJECT_NAME%_env

# use this complicated way to load packages in case multiple version are installed
# this was needed as two version of py-pip are installed (one is only a build
# dependency). Since we now run `spack gc -y` this is superfluous (build only
# dependencies are removed before we land here), but we keep it for now.
#PACKAGES_TO_LOAD=("python" "py-pip" "gcc")
#for PKG_NAME in ${PACKAGES_TO_LOAD[@]}; do
# SHORT_SPEC=$(spack find --explicit --format "{short_spec}" $PKG_NAME)
# SHORT_SPEC=${SHORT_SPEC%/*} # remove hash after `/` character
# spack load $SHORT_SPEC
#done
spack load python py-pip boost julia
39 changes: 39 additions & 0 deletions docker/base_spack_deps/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# rebuild counter 3 # just a counter to increase when we want a new image
ARG BASE_IMAGE=gridtools_jl_base_image
FROM $BASE_IMAGE as builder
ARG PROJECT_NAME=gridtools_jl
ARG SPACK_ENV_FILE=spack-daint-p100.yaml

# TODO(tehrengruber): Copy spack environment to clean image. Then we don't need to run `spack gc`
# and `spack clean` anymore. See https://spack.readthedocs.io/en/latest/containers.html for
# more information.

WORKDIR /opt/${PROJECT_NAME}_env/spack/bin

COPY ./docker/base_spack_deps/${SPACK_ENV_FILE} ./spack_env_${PROJECT_NAME}.yaml
RUN ./spack env create ${PROJECT_NAME}_env spack_env_${PROJECT_NAME}.yaml
# remove all compilers such that everything is built with the compiler we installed
RUN ./spack compiler remove -a gcc
RUN ./spack -e ${PROJECT_NAME}_env compiler find $(./spack location --install-dir gcc@11)
# using --fresh ensures the concretization does not care about the build cache (untested and not
# used right now as we don't use a build cache yet)
RUN ./spack -e ${PROJECT_NAME}_env concretize --fresh
COPY ./docker/base_spack_deps/run_until_succeed.sh ./run_until_succeed.sh
RUN ./run_until_succeed.sh ./spack -e ${PROJECT_NAME}_env install

# cleanup
RUN ./spack -e ${PROJECT_NAME}_env clean --all
RUN ./spack -e ${PROJECT_NAME}_env gc -y

# strip all the binaries
RUN find -L /opt/${PROJECT_NAME}_env/spack/opt -type f -exec readlink -f '{}' \; | \
xargs file -i | \
grep 'charset=binary' | \
grep 'x-executable\|x-archive\|x-sharedlib' | \
awk -F: '{print $1}' | xargs strip -x || true

WORKDIR /

# flatten image
FROM scratch
COPY --from=builder / /
23 changes: 23 additions & 0 deletions docker/base_spack_deps/run_until_succeed.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

# Set the maximum number of attempts
max_attempts=10
attempt=0

# Check if a command is provided
if [ $# -eq 0 ]; then
echo "Usage: $0 MY_BASH_COMMAND ARGS..."
exit 1
fi

# Loop until the command succeeds or the maximum attempts are reached
while ! "$@"; do
attempt=$((attempt + 1))
if [ $attempt -ge $max_attempts ]; then
echo "Command failed after $max_attempts attempts."
exit 1
fi
echo "Attempt $attempt/$max_attempts failed. Retrying..."
done

echo "Command succeeded on attempt $attempt."
Loading

0 comments on commit ac46bff

Please sign in to comment.