From cca0183db50d63b041a3c80bcdd48daa7ceaf2ac Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 18 Oct 2023 17:10:46 +0200 Subject: [PATCH 01/23] Initial documentation on how to debug failed builds interactively --- .../software_layer/debugging_failed_builds.md | 129 ++++++++++++++++++ 1 file changed, 129 insertions(+) create mode 100644 docs/software_layer/debugging_failed_builds.md diff --git a/docs/software_layer/debugging_failed_builds.md b/docs/software_layer/debugging_failed_builds.md new file mode 100644 index 000000000..689dcba49 --- /dev/null +++ b/docs/software_layer/debugging_failed_builds.md @@ -0,0 +1,129 @@ +# Debugging failed builds + +Unfortunately, software does not always build succesfully. Since EESSI targets novel CPU architectures as well, build failures on such platforms are quite common, as the software and/or the software build systems have not always been adjusted to support these architectures yet. Another challenge in EESSI is that the builds are done by a bot. While this is great for builds that complete succesfully (we can build a lot of software, for a wide range of hardware because of this automation), it does means that you, as contributor, can not easily access the build directory and build logs to figure out build issues. + +This page describes how you can interactively reproduce failed builds, so that you can more easily debug the issue. The assumption, of course, is that you have access to a node of the architecture on which the build is failing. + +Throughout this page, we will use [this PR](https://github.com/EESSI/software-layer/pull/360) as an example. It builds LAMMPS, and failed (among other things) on a [build issue for Plumed](https://github.com/EESSI/software-layer/pull/360#issuecomment-1765913105). + +## Preparing the environment +A number of steps are needed to create the same environment in which the bot builds. + +- Fetching the feature branch from which you want to replicate a build. +- Starting a shell in the EESSI container. +- Setting the version of the EESSI software stack to use. +- Start the Gentoo Prefix environment. +- Start the EESSI software environment. +- Configure EasyBuild. + +### Fetching the feature branch +Looking at [the example PR](https://github.com/EESSI/software-layer/pull/360), we see the PR is created from [this fork](https://github.com/laraPPr/software-layer/). First, we clone the fork, then checkout the feature branch (`LAMMPS_23Jun2022`) +``` +git clone https://github.com/laraPPr/software-layer/ +cd software-layer +git checkout LAMMPS_23Jun2022 +``` +Alternatively, if you already have a clone of the `software-layer` you can add it as a new remote +``` +cd software-layer +git remote add laraPPr https://github.com/laraPPr/software-layer/ +git fetch laraPPr +git checkout LAMMPS_23Jun2022 +``` + +### Starting a shell in the EESSI container +Simply run the EESSI container (`eessi_container.sh`), which should be in the root of the `software-layer` repository +``` +./eessi_container.sh +``` +!!! Note + You may have to press enter to clearly see the prompt as some messages + beginning with `CernVM-FS: ` have been printed after the first prompt + `Apptainer> ` was shown. + +For more info on using the EESSI container, see [here](../getting_access/eessi_container). + +### Start the Gentoo Prefix environment +The next step is to start the Gentoo Prefix environment. + +Before we start, check the current values of `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` so that you can reset them later: +``` +echo ${EESSI_CVMFS_REPO} +echo ${EESSI_PILOT_VERSION} +``` + +To do that, you need to run the `startprefix` command. However, we have several compatibility layers, and you'll need to run it for the one that matches the host node. For example, on an x86_64 linux machine: +``` +export EESSI_OS_TYPE=linux +export EESSI_CPU_FAMILY=x86_64 +${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix +``` + +if you are unsure, you can start the EESSI software environment (see next step) and check the values of `EESSI_OS_TYPE` and `EESSI_CPU_FAMILY` set by that initialization script. Note that you'll have to start over with a new shell (i.e. quit the container) and repeat the current step of starting the Gentoo Prefix environment, as the order of those two steps matters. + +Now, reset the `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` in your prefix environment +``` +export EESSI_CVMFS_REPO=... +export EESSI_PILOT_VERSION=... +``` + +!!! Note + By activating the Gentoo Prefix environment, the system tools (e.g. `ls`) you would normally use are now provided by Gentoo Prefix, instead of the container OS. E.g. running `which ls` after starting the prefix environment as above will return `/cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/x86_64/bin/ls`. This makes the builds completely independent from the container OS. + +### Starting the EESSI software environment +To activate the software environment, run +``` +source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/bash +``` + +!!! Note + If you get an error `bash: /versions//init/bash: No such file or directory`, you forgot to reset the `${EESSI_CVFMS_REPO}` and `${EESSI_PILOT_VERSION}` environment variables at the end of the previous step. + +For more info on starting the EESSI software environment, see [here](../using_eessi/setting_up_environment/) + +### Configure EasyBuild +It is important that we configure EasyBuild in the same way as the bot uses it, with two small exceptions: + +- Our working directory will be different +- Our installpath will be different + +For both, any writeable path will do. In this example, we will choose `/tmp/easybuild` as our workdir, and `$HOME/.local/easybuild` as our installpath. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables. + +``` +export WORKDIR=/tmp/easybuild +export EASYBUILD_INSTALLPATH="${HOME}/.local/easybuild" +source configure_easybuild +``` +Next, we need to determine the correct version of EasyBuild to load. Since [the example PR](https://github.com/EESSI/software-layer/pull/360) changes the file `eessi-2023.06-eb-4.8.1-2021b.yml`, this tells us the bot was using version `4.8.1` of EasyBuild to build this. Thus, we load that version of the EasyBuild module and check if everything was configured correctly: +``` +module load EasyBuild/4.8.1 +eb --show-config +``` +You should get something similar to + +``` +# +# Current EasyBuild configuration +# (C: command line argument, D: default value, E: environment variable, F: configuration file) +# +buildpath (E) = /tmp/easybuild/easybuild/build +containerpath (E) = /tmp/easybuild/easybuild/containers +debug (E) = True +experimental (E) = True +filter-deps (E) = Autoconf, Automake, Autotools, binutils, bzip2, DBus, flex, gettext, gperf, help2man, intltool, libreadline, libtool, Lua, M4, makeinfo, ncurses, util-linux, XZ, zlib, Yasm +filter-env-vars (E) = LD_LIBRARY_PATH +hooks (E) = /home/casparvl/software-layer/eb_hooks.py +ignore-osdeps (E) = True +installpath (E) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1 +module-extensions (E) = True +packagepath (E) = /tmp/easybuild/easybuild/packages +prefix (E) = /tmp/easybuild/easybuild +read-only-installdir (E) = True +repositorypath (E) = /tmp/easybuild/easybuild/ebfiles_repo +robot-paths (D) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/EasyBuild/4.8.1/easybuild/easyconfigs +rpath (E) = True +sourcepath (E) = /tmp/easybuild/easybuild/sources: +sysroot (E) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/aarch64 +trace (E) = True +zip-logs (E) = bzip2 +``` From 91f5d629d55f086d008e38ca2b40a24dcd25be64 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 19 Oct 2023 11:54:50 +0200 Subject: [PATCH 02/23] Added build instructions after prep instructions. Added prerequisites. Added the new page to the mkdocs --- .../software_layer/debugging_failed_builds.md | 32 +++++++++++++++++-- mkdocs.yml | 1 + 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/docs/software_layer/debugging_failed_builds.md b/docs/software_layer/debugging_failed_builds.md index 689dcba49..28b28eb1f 100644 --- a/docs/software_layer/debugging_failed_builds.md +++ b/docs/software_layer/debugging_failed_builds.md @@ -6,6 +6,12 @@ This page describes how you can interactively reproduce failed builds, so that y Throughout this page, we will use [this PR](https://github.com/EESSI/software-layer/pull/360) as an example. It builds LAMMPS, and failed (among other things) on a [build issue for Plumed](https://github.com/EESSI/software-layer/pull/360#issuecomment-1765913105). +## Prerequisites +You will need to have: + +- Access to a machine with the hardware for which the build that you want to debug failed. +- On that machine, meet the requirements for running the EESSI container, as described on [this page](../getting_access/eessi_container.md#prerequisites) + ## Preparing the environment A number of steps are needed to create the same environment in which the bot builds. @@ -41,7 +47,7 @@ Simply run the EESSI container (`eessi_container.sh`), which should be in the ro beginning with `CernVM-FS: ` have been printed after the first prompt `Apptainer> ` was shown. -For more info on using the EESSI container, see [here](../getting_access/eessi_container). +For more info on using the EESSI container, see [here](../getting_access/eessi_container.md). ### Start the Gentoo Prefix environment The next step is to start the Gentoo Prefix environment. @@ -71,6 +77,9 @@ export EESSI_PILOT_VERSION=... By activating the Gentoo Prefix environment, the system tools (e.g. `ls`) you would normally use are now provided by Gentoo Prefix, instead of the container OS. E.g. running `which ls` after starting the prefix environment as above will return `/cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/x86_64/bin/ls`. This makes the builds completely independent from the container OS. ### Starting the EESSI software environment +!!! Note + If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EESSI_SOFTWARE_SUBDIR_OVERRIDE=x86_64/generic` before starting the EESSI environment. + To activate the software environment, run ``` source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/bash @@ -79,7 +88,8 @@ source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/bash !!! Note If you get an error `bash: /versions//init/bash: No such file or directory`, you forgot to reset the `${EESSI_CVFMS_REPO}` and `${EESSI_PILOT_VERSION}` environment variables at the end of the previous step. -For more info on starting the EESSI software environment, see [here](../using_eessi/setting_up_environment/) + +For more info on starting the EESSI software environment, see [here](../using_eessi/setting_up_environment.md) ### Configure EasyBuild It is important that we configure EasyBuild in the same way as the bot uses it, with two small exceptions: @@ -127,3 +137,21 @@ sysroot (E) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/li trace (E) = True zip-logs (E) = bzip2 ``` + +!!! Note + If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC`. + +## Building the software +When the bot builds software, it loops over all EasyStack files that have been changed, and builds them using EasyBuild. However, a single PR may add multiple items to a single EasyStack file, and the issue you are trying to debug is probably in _one_ of them. Getting EasyBuild to build the full EasyStack file will create the most similar situation to what the bot does. However, you _may_ just want to build the individual software that has changed. Below, we describe both approaches. + +### Building everything in the EasyStack file +In our [example PR](https://github.com/EESSI/software-layer/pull/360), the EasyStack file that was changed was `eessi-2023.06-eb-4.8.1-2021b.yml`. To build this, we run (in the directory that contains the checkout of this feature branch): +``` +eb --easystack eessi-2023.06-eb-4.8.1-2021b.yml --robot +``` + +### Building an individual package +In our [example PR](https://github.com/EESSI/software-layer/pull/360), the individual package that was added to `eessi-2023.06-eb-4.8.1-2021b.yml` was `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`. We'll also have to mind any options that are listed in the EasyStack file for `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`, in this case the option `--from-pr 19000`. Thus, to build, we run: +``` +eb LAMMPS-23Jun2022-foss-2021b-kokkos.eb --robot --from-pr 19000 +``` diff --git a/mkdocs.yml b/mkdocs.yml index 4ed1996f9..dc37b29a4 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,7 @@ nav: - software_layer/cpu_targets.md - software_layer/build_nodes.md - software_layer/adding_software.md + - software_layer/debugging_failed_builds.md - Test suite: - Overview: test-suite/index.md - Installation & configuration: test-suite/installation-configuration.md From 3dd417df951cbb02544d3b308bbf25d26bbc4ff2 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Fri, 20 Oct 2023 15:59:08 +0200 Subject: [PATCH 03/23] Moved contributing software pages to their own top level header to make it easier to find --- .../adding_software.md | 0 .../debugging_failed_builds.md | 0 mkdocs.yml | 9 +++++++-- 3 files changed, 7 insertions(+), 2 deletions(-) rename docs/{software_layer => contributing_sw}/adding_software.md (100%) rename docs/{software_layer => contributing_sw}/debugging_failed_builds.md (100%) diff --git a/docs/software_layer/adding_software.md b/docs/contributing_sw/adding_software.md similarity index 100% rename from docs/software_layer/adding_software.md rename to docs/contributing_sw/adding_software.md diff --git a/docs/software_layer/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md similarity index 100% rename from docs/software_layer/debugging_failed_builds.md rename to docs/contributing_sw/debugging_failed_builds.md diff --git a/mkdocs.yml b/mkdocs.yml index dc37b29a4..9c839c79d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -25,8 +25,6 @@ nav: - Overview: software_layer.md - software_layer/cpu_targets.md - software_layer/build_nodes.md - - software_layer/adding_software.md - - software_layer/debugging_failed_builds.md - Test suite: - Overview: test-suite/index.md - Installation & configuration: test-suite/installation-configuration.md @@ -42,6 +40,13 @@ nav: - using_eessi/setting_up_environment.md - using_eessi/basic_commands.md - using_eessi/eessi_demos.md + - Contributing software to EESSI: + # Todo: insert an overview page with a flowchart showing the high level process + # - Overview: contributing_sw/overview.md + - contributing_sw/adding_software.md + - contributing_sw/debugging_failed_builds.md + # Todo: write on how to contribute to the EESSI test suite + # - Contributing software tests to the EESSI test suite: - Getting support: support.md - Meetings: - Overview: meetings.md From f41362b1d8fd8bc6118bcd87e93cc9219c668e61 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Fri, 20 Oct 2023 16:11:42 +0200 Subject: [PATCH 04/23] Fixed typo --- docs/contributing_sw/debugging_failed_builds.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 28b28eb1f..0236b5e35 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -1,8 +1,10 @@ # Debugging failed builds -Unfortunately, software does not always build succesfully. Since EESSI targets novel CPU architectures as well, build failures on such platforms are quite common, as the software and/or the software build systems have not always been adjusted to support these architectures yet. Another challenge in EESSI is that the builds are done by a bot. While this is great for builds that complete succesfully (we can build a lot of software, for a wide range of hardware because of this automation), it does means that you, as contributor, can not easily access the build directory and build logs to figure out build issues. +Unfortunately, software does not always build successfully. Since EESSI targets novel CPU architectures as well, build failures on such platforms are quite common, as the software and/or the software build systems have not always been adjusted to support these architectures yet. -This page describes how you can interactively reproduce failed builds, so that you can more easily debug the issue. The assumption, of course, is that you have access to a node of the architecture on which the build is failing. +In EESSI, the build are performed by a bot. This is great for builds that complete successfully as we can build a lot of software, for a wide range of hardware because of this automation. However, it does means that you, as contributor, can not easily access the build directory and build logs to figure out build issues. + +This page describes how you can interactively reproduce failed builds, so that you can more easily debug the issue. Throughout this page, we will use [this PR](https://github.com/EESSI/software-layer/pull/360) as an example. It builds LAMMPS, and failed (among other things) on a [build issue for Plumed](https://github.com/EESSI/software-layer/pull/360#issuecomment-1765913105). @@ -10,7 +12,7 @@ Throughout this page, we will use [this PR](https://github.com/EESSI/software-la You will need to have: - Access to a machine with the hardware for which the build that you want to debug failed. -- On that machine, meet the requirements for running the EESSI container, as described on [this page](../getting_access/eessi_container.md#prerequisites) +- On that machine, meet the requirements for running the EESSI container, as described on [this page](../getting_access/eessi_container.md#prerequisites). ## Preparing the environment A number of steps are needed to create the same environment in which the bot builds. @@ -58,10 +60,10 @@ echo ${EESSI_CVMFS_REPO} echo ${EESSI_PILOT_VERSION} ``` -To do that, you need to run the `startprefix` command. However, we have several compatibility layers, and you'll need to run it for the one that matches the host node. For example, on an x86_64 linux machine: +To do that, you need to run the `startprefix` command. However, we have several compatibility layers, and you'll need to run it for the one that matches the host node. For example, on an aarch64 (ARM) linux machine: ``` export EESSI_OS_TYPE=linux -export EESSI_CPU_FAMILY=x86_64 +export EESSI_CPU_FAMILY=aarch64 ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix ``` @@ -78,7 +80,7 @@ export EESSI_PILOT_VERSION=... ### Starting the EESSI software environment !!! Note - If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EESSI_SOFTWARE_SUBDIR_OVERRIDE=x86_64/generic` before starting the EESSI environment. + If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${EESSI_CPU_FAMILY}/generic` before starting the EESSI environment. To activate the software environment, run ``` @@ -149,9 +151,11 @@ In our [example PR](https://github.com/EESSI/software-layer/pull/360), the EasyS ``` eb --easystack eessi-2023.06-eb-4.8.1-2021b.yml --robot ``` +After some time, this build fails whil trying to build `Plumed`, and we can access the build log to look for clues on why it failed. ### Building an individual package In our [example PR](https://github.com/EESSI/software-layer/pull/360), the individual package that was added to `eessi-2023.06-eb-4.8.1-2021b.yml` was `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`. We'll also have to mind any options that are listed in the EasyStack file for `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`, in this case the option `--from-pr 19000`. Thus, to build, we run: ``` eb LAMMPS-23Jun2022-foss-2021b-kokkos.eb --robot --from-pr 19000 ``` +After some time, this build fails whil trying to build `Plumed`, and we can access the build log to look for clues on why it failed. From de32bbe49b02a1e96cf892ec38b1123f428a4455 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Fri, 20 Oct 2023 16:18:44 +0200 Subject: [PATCH 05/23] Fix links --- docs/bot.md | 4 ++-- docs/contributing_sw/adding_software.md | 4 ++-- docs/support.md | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/bot.md b/docs/bot.md index 07293e7b4..c057e4f0d 100644 --- a/docs/bot.md +++ b/docs/bot.md @@ -6,7 +6,7 @@ Building, testing, and deploying software is done by one or more *bot instances* The EESSI build-test-deploy bot :robot: is implemented as a [GitHub App](https://docs.github.com/en/apps/overview) in the [`eessi-bot-software-layer` repository](https://github.com/EESSI/eessi-bot-software-layer). -It operates in the context of [pull requests](software_layer/adding_software.md#software_layer_pull_request) to +It operates in the context of [pull requests](contributing_sw/adding_software.md#software_layer_pull_request) to the [`compatibility-layer` repository](https://github.com/EESSI/compatibility-layer) or the [`software-layer` repository](https://github.com/EESSI/software-layer), and follows the instructions supplied by humans, @@ -61,7 +61,7 @@ to trigger building of software, and to deploy software installations in to the ## Building { #building } To instruct the bot :robot: to build software, one or more `build` instructions -should be issued by posting a comment in the pull request (see also [here](software_layer/adding_software.md#bot_build)). +should be issued by posting a comment in the pull request (see also [here](contributing_sw/adding_software.md#bot_build)). The most basic build instruction that can be sent to the bot is: diff --git a/docs/contributing_sw/adding_software.md b/docs/contributing_sw/adding_software.md index ba300c392..157c1075d 100644 --- a/docs/contributing_sw/adding_software.md +++ b/docs/contributing_sw/adding_software.md @@ -5,7 +5,7 @@ To add software to EESSI, you should go through the semi-automatic software inst * 1) Making a pull request to the [software-layer](https://github.com/EESSI/software-layer) repository to (add or) update an [easystack file](https://docs.easybuild.io/easystack-files) :books: that is used by [EasyBuild](https://docs.easybuild.io/) to install software; -* 2) Instructing the [bot :robot:](../bot.md) to build the software on all [supported CPU microarchitectures](cpu_targets.md); +* 2) Instructing the [bot :robot:](../bot.md) to build the software on all [supported CPU microarchitectures](../software_layer/cpu_targets.md); * 3) Instructing the [bot :robot:](../bot.md) to deploy the built software for ingestion into the EESSI repository; * 4) Merging the pull request once CI indicates that the software has been ingested. :white_check_mark: @@ -108,7 +108,7 @@ For more information, see the [building section in the bot documentation](../bot * If one of the builds failed, you can let the bot retry that specific build. -* Make sure that the software has been built correctly for all [CPU targets](cpu_targets.md) before you deploy! +* Make sure that the software has been built correctly for all [CPU targets](../software_layer/cpu_targets.md) before you deploy! #### Checking the builds :mag: diff --git a/docs/support.md b/docs/support.md index 7ed261e5f..2faf44a8a 100644 --- a/docs/support.md +++ b/docs/support.md @@ -38,7 +38,7 @@ Note that we can only help with problems related to the software *installations* We are open to software requests for software that is not included in EESSI yet. -The quickest way to add additional software to EESSI is by contributing it yourself as a community contribution, please see the [documentation on adding software](software_layer/adding_software.md). +The quickest way to add additional software to EESSI is by contributing it yourself as a community contribution, please see the [documentation on adding software](contributing_sw/adding_software.md). Alternatively, you can send in a request to our support team. Please try to provide as much information on the software as possible: preferably use the [issue template](https://gitlab.com/eessi/support/-/issues/new?issuable_template=Software_request) (which requires you to log in to GitLab), or make sure to cover the items listed [here](https://gitlab.com/eessi/support/-/blob/main/.gitlab/issue_templates/Software_request.md). From 2fbdfec1ea9e90aaa27fe8f6bcd332d72c436570 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Fri, 20 Oct 2023 16:42:01 +0200 Subject: [PATCH 06/23] We now get the EESSI version from the container environment, so this step is no longer present --- docs/contributing_sw/debugging_failed_builds.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 0236b5e35..ab0929e7d 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -19,7 +19,6 @@ A number of steps are needed to create the same environment in which the bot bui - Fetching the feature branch from which you want to replicate a build. - Starting a shell in the EESSI container. -- Setting the version of the EESSI software stack to use. - Start the Gentoo Prefix environment. - Start the EESSI software environment. - Configure EasyBuild. From abe13f9adad6526ba4b137a886be0afc467aade3 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 12:08:02 +0200 Subject: [PATCH 07/23] Swap order, as sourcing overwrites the EASYBUILD_INSTALLPATH otherwise --- docs/contributing_sw/debugging_failed_builds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index ab0929e7d..94bb3bdc1 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -102,8 +102,8 @@ For both, any writeable path will do. In this example, we will choose `/tmp/easy ``` export WORKDIR=/tmp/easybuild -export EASYBUILD_INSTALLPATH="${HOME}/.local/easybuild" source configure_easybuild +export EASYBUILD_INSTALLPATH="${HOME}/.local/easybuild" ``` Next, we need to determine the correct version of EasyBuild to load. Since [the example PR](https://github.com/EESSI/software-layer/pull/360) changes the file `eessi-2023.06-eb-4.8.1-2021b.yml`, this tells us the bot was using version `4.8.1` of EasyBuild to build this. Thus, we load that version of the EasyBuild module and check if everything was configured correctly: ``` From 59de2b8c08efa2efe8411ead3607ed56706717e4 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 12:24:09 +0200 Subject: [PATCH 08/23] Added instructions on using and flags for container in order to save time debugging issues that require a lot of dependencies to first be build --- docs/contributing_sw/debugging_failed_builds.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 94bb3bdc1..3c7f70285 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -43,6 +43,8 @@ Simply run the EESSI container (`eessi_container.sh`), which should be in the ro ``` ./eessi_container.sh ``` +You may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. This is especially useful when debugging an issue for which dependencies first have to be build, since the next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off. + !!! Note You may have to press enter to clearly see the prompt as some messages beginning with `CernVM-FS: ` have been printed after the first prompt From 187db42bc76a73994f46f91b6d2653f4cd7c7260 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 12:25:50 +0200 Subject: [PATCH 09/23] Clarified description on why to use --save and --resume with eessi container --- docs/contributing_sw/debugging_failed_builds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 3c7f70285..7b733059c 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -43,7 +43,7 @@ Simply run the EESSI container (`eessi_container.sh`), which should be in the ro ``` ./eessi_container.sh ``` -You may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. This is especially useful when debugging an issue for which dependencies first have to be build, since the next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off. +If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. Next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off, having all dependencies already built and available. !!! Note You may have to press enter to clearly see the prompt as some messages From 3b4f4edebbd3dffa6de881cb27cc354e6415bb12 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 12:32:20 +0200 Subject: [PATCH 10/23] Expand explaination on using --save, as it only saves if you actually exit the container --- .../debugging_failed_builds.md | 22 +++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 7b733059c..b69cd8d8a 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -43,13 +43,31 @@ Simply run the EESSI container (`eessi_container.sh`), which should be in the ro ``` ./eessi_container.sh ``` -If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. Next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off, having all dependencies already built and available. - !!! Note You may have to press enter to clearly see the prompt as some messages beginning with `CernVM-FS: ` have been printed after the first prompt `Apptainer> ` was shown. +If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. E.g. + +``` +./eessi_container.sh --save ${HOME}/pr370/ +``` +The tarball will be saved when you exit the container. Note that the first `exit` command will first make you exit the Gentoo prefix environment. Only the second will take you out of the container, and print where the tarball will be stored: +``` +[EESSI pilot 2023.06] $ exit +logout +Leaving Gentoo Prefix with exit status 1 +Apptainer> exit +exit +Saved contents of tmp directory '/tmp/eessi.VgLf1v9gf0' to tarball '${HOME}/pr370/EESSI-pilot-1698056784.tgz' (to resume session add '--resume ${HOME}/pr370//EESSI-pilot-1698056784.tgz') +``` + +Next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off, having all dependencies already built and available. +``` +./eessi_container.sh --resume ${HOME}/pr370//EESSI-pilot-1698056784.tgz +``` + For more info on using the EESSI container, see [here](../getting_access/eessi_container.md). ### Start the Gentoo Prefix environment From c8c91f18a6b82389d768a34d19315f6e229c1696 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 12:37:44 +0200 Subject: [PATCH 11/23] Fix installpath that should be displayed by --show-config to reflect new order of sourcing configure_easybuild and exporting EASYBUILD_INSTALLPATH --- docs/contributing_sw/debugging_failed_builds.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index b69cd8d8a..b47458dd3 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -123,7 +123,7 @@ For both, any writeable path will do. In this example, we will choose `/tmp/easy ``` export WORKDIR=/tmp/easybuild source configure_easybuild -export EASYBUILD_INSTALLPATH="${HOME}/.local/easybuild" +export EASYBUILD_INSTALLPATH="/tmp/easybuild" ``` Next, we need to determine the correct version of EasyBuild to load. Since [the example PR](https://github.com/EESSI/software-layer/pull/360) changes the file `eessi-2023.06-eb-4.8.1-2021b.yml`, this tells us the bot was using version `4.8.1` of EasyBuild to build this. Thus, we load that version of the EasyBuild module and check if everything was configured correctly: ``` @@ -145,7 +145,7 @@ filter-deps (E) = Autoconf, Automake, Autotools, binutils, bzip2, DBus, filter-env-vars (E) = LD_LIBRARY_PATH hooks (E) = /home/casparvl/software-layer/eb_hooks.py ignore-osdeps (E) = True -installpath (E) = /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1 +installpath (E) = /tmp/easybuild/software/linux/aarch64/neoverse_n1 module-extensions (E) = True packagepath (E) = /tmp/easybuild/easybuild/packages prefix (E) = /tmp/easybuild/easybuild From e2a5cb70c7d0e22cedb74e0ec889685e5cf0b620 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 12:42:42 +0200 Subject: [PATCH 12/23] Take out reference to my user login from the example --- docs/contributing_sw/debugging_failed_builds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index b47458dd3..9291b8f82 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -143,7 +143,7 @@ debug (E) = True experimental (E) = True filter-deps (E) = Autoconf, Automake, Autotools, binutils, bzip2, DBus, flex, gettext, gperf, help2man, intltool, libreadline, libtool, Lua, M4, makeinfo, ncurses, util-linux, XZ, zlib, Yasm filter-env-vars (E) = LD_LIBRARY_PATH -hooks (E) = /home/casparvl/software-layer/eb_hooks.py +hooks (E) = ${HOME}/software-layer/eb_hooks.py ignore-osdeps (E) = True installpath (E) = /tmp/easybuild/software/linux/aarch64/neoverse_n1 module-extensions (E) = True From c487e66340bf4d78803c00b3b4429612ee3dfaff Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 13:25:38 +0200 Subject: [PATCH 13/23] Added instructions to add EASYBUILD_INSTALLPATH/modules/all to the MODULEPATH so that users actually see the modules that are installed interactively --- docs/contributing_sw/debugging_failed_builds.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 9291b8f82..a0eb132c3 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -125,6 +125,11 @@ export WORKDIR=/tmp/easybuild source configure_easybuild export EASYBUILD_INSTALLPATH="/tmp/easybuild" ``` +Note that you probably also want to add the path where the modules are installed to your `MODULEPATH`: +``` +module use ${EASYBUILD_INSTALLPATH}/modules/all +``` + Next, we need to determine the correct version of EasyBuild to load. Since [the example PR](https://github.com/EESSI/software-layer/pull/360) changes the file `eessi-2023.06-eb-4.8.1-2021b.yml`, this tells us the bot was using version `4.8.1` of EasyBuild to build this. Thus, we load that version of the EasyBuild module and check if everything was configured correctly: ``` module load EasyBuild/4.8.1 From 76d6ad7bc0c75f71de36336ff983e835fef0530d Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 23 Oct 2023 15:14:33 +0200 Subject: [PATCH 14/23] Strip unnecessary quotes --- docs/contributing_sw/debugging_failed_builds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index a0eb132c3..f13d19c39 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -123,7 +123,7 @@ For both, any writeable path will do. In this example, we will choose `/tmp/easy ``` export WORKDIR=/tmp/easybuild source configure_easybuild -export EASYBUILD_INSTALLPATH="/tmp/easybuild" +export EASYBUILD_INSTALLPATH=/tmp/easybuild ``` Note that you probably also want to add the path where the modules are installed to your `MODULEPATH`: ``` From 76d10569812a340df60014f4f0e0cf7aac5811d4 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 25 Oct 2023 14:16:41 +0200 Subject: [PATCH 15/23] Simplify instruction a bit by leveraging minimal_eessi_env to load set the EESSI_CPU_FAMILY and EESSI_OS_TYPE --- docs/contributing_sw/debugging_failed_builds.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index f13d19c39..c2647e242 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -79,15 +79,16 @@ echo ${EESSI_CVMFS_REPO} echo ${EESSI_PILOT_VERSION} ``` -To do that, you need to run the `startprefix` command. However, we have several compatibility layers, and you'll need to run it for the one that matches the host node. For example, on an aarch64 (ARM) linux machine: +Then, we'll source a script that sets `EESSI_OS_TYPE` and `EESSI_CPU_FAMILY` automatically, by detecting the host OS and CPU: +``` +source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/minimal_eessi_env +``` +Then, run the `startprefix` command to actually start the Gentoo Prefix environment: ``` -export EESSI_OS_TYPE=linux export EESSI_CPU_FAMILY=aarch64 ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix ``` -if you are unsure, you can start the EESSI software environment (see next step) and check the values of `EESSI_OS_TYPE` and `EESSI_CPU_FAMILY` set by that initialization script. Note that you'll have to start over with a new shell (i.e. quit the container) and repeat the current step of starting the Gentoo Prefix environment, as the order of those two steps matters. - Now, reset the `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` in your prefix environment ``` export EESSI_CVMFS_REPO=... From 1320c0992fe23cc91e6114c9ef482875032545b9 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 25 Oct 2023 14:19:25 +0200 Subject: [PATCH 16/23] Forgot to remove one line in previous commit --- docs/contributing_sw/debugging_failed_builds.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index c2647e242..2dbccb328 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -85,7 +85,6 @@ source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/minimal_eessi_en ``` Then, run the `startprefix` command to actually start the Gentoo Prefix environment: ``` -export EESSI_CPU_FAMILY=aarch64 ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix ``` From 8d357f60a58c86aa68e7114fcd2b63ccd503c51d Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 25 Oct 2023 15:26:07 +0200 Subject: [PATCH 17/23] Processed most of Thomas' comments --- .../debugging_failed_builds.md | 52 ++++++++++--------- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 2dbccb328..4f7c631f9 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -2,11 +2,11 @@ Unfortunately, software does not always build successfully. Since EESSI targets novel CPU architectures as well, build failures on such platforms are quite common, as the software and/or the software build systems have not always been adjusted to support these architectures yet. -In EESSI, the build are performed by a bot. This is great for builds that complete successfully as we can build a lot of software, for a wide range of hardware because of this automation. However, it does means that you, as contributor, can not easily access the build directory and build logs to figure out build issues. +In EESSI, all software packages are built by a bot. This is great for builds that complete successfully as we can build many software packages for a wide range of hardware with little human intervention. However, it does mean that you, as contributor, can not easily access the build directory and build logs to figure out build issues. This page describes how you can interactively reproduce failed builds, so that you can more easily debug the issue. -Throughout this page, we will use [this PR](https://github.com/EESSI/software-layer/pull/360) as an example. It builds LAMMPS, and failed (among other things) on a [build issue for Plumed](https://github.com/EESSI/software-layer/pull/360#issuecomment-1765913105). +Throughout this page, we will use [this PR](https://github.com/EESSI/software-layer/pull/360) as an example. It intends to add LAMMPS to EESSI. Among other issues, it failed on a [building Plumed](https://github.com/EESSI/software-layer/pull/360#issuecomment-1765913105). ## Prerequisites You will need to have: @@ -45,13 +45,13 @@ Simply run the EESSI container (`eessi_container.sh`), which should be in the ro ``` !!! Note You may have to press enter to clearly see the prompt as some messages - beginning with `CernVM-FS: ` have been printed after the first prompt - `Apptainer> ` was shown. + beginning with `CernVM-FS: ` have been printed after the first prompt + `Apptainer> ` was shown. -If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`) in order to be able to resume later. E.g. +If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`). This saves the temporary directory (which we will use as working and installation directory later in this instruction) in order to be able to resume later with the same temporary directory. E.g. ``` -./eessi_container.sh --save ${HOME}/pr370/ +./eessi_container.sh --save ${HOME}/pr370 ``` The tarball will be saved when you exit the container. Note that the first `exit` command will first make you exit the Gentoo prefix environment. Only the second will take you out of the container, and print where the tarball will be stored: ``` @@ -60,15 +60,18 @@ logout Leaving Gentoo Prefix with exit status 1 Apptainer> exit exit -Saved contents of tmp directory '/tmp/eessi.VgLf1v9gf0' to tarball '${HOME}/pr370/EESSI-pilot-1698056784.tgz' (to resume session add '--resume ${HOME}/pr370//EESSI-pilot-1698056784.tgz') +Saved contents of tmp directory '/tmp/eessi.VgLf1v9gf0' to tarball '${HOME}/pr370/EESSI-pilot-1698056784.tgz' (to resume session add '--resume ${HOME}/pr370/EESSI-pilot-1698056784.tgz') ``` +Note that the tarballs can be quite sizeable, so make sure to pick a filesystem where you have a large enough quotum. + + Next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off, having all dependencies already built and available. ``` -./eessi_container.sh --resume ${HOME}/pr370//EESSI-pilot-1698056784.tgz +./eessi_container.sh --resume ${HOME}/pr370/EESSI-pilot-1698056784.tgz ``` -For more info on using the EESSI container, see [here](../getting_access/eessi_container.md). +For a detailed description on using the script `eessi_container.sh`, see [here](../getting_access/eessi_container.md). ### Start the Gentoo Prefix environment The next step is to start the Gentoo Prefix environment. @@ -79,12 +82,10 @@ echo ${EESSI_CVMFS_REPO} echo ${EESSI_PILOT_VERSION} ``` -Then, we'll source a script that sets `EESSI_OS_TYPE` and `EESSI_CPU_FAMILY` automatically, by detecting the host OS and CPU: -``` -source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/minimal_eessi_env -``` -Then, run the `startprefix` command to actually start the Gentoo Prefix environment: +Then, we set `EESSI_OS_TYPE` and `EESSI_CPU_FAMILY` and run the `startprefix` command to start the Gentoo Prefix environment: ``` +export EESSI_OS_TYPE=linux # We only support Linux for now +export EESSI_CPU_FAMILY=$(uname -m) ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix ``` @@ -118,14 +119,20 @@ It is important that we configure EasyBuild in the same way as the bot uses it, - Our working directory will be different - Our installpath will be different -For both, any writeable path will do. In this example, we will choose `/tmp/easybuild` as our workdir, and `$HOME/.local/easybuild` as our installpath. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables. +For both, any writeable path will do. In this example, we create a unique temporary directory inside `/tmp` to serve both as our workdir and installpath. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables. ``` -export WORKDIR=/tmp/easybuild +export WORKDIR=$(mktemp --directory --tmpdir=/tmp -t eessi.XXXXXXXXXX) source configure_easybuild -export EASYBUILD_INSTALLPATH=/tmp/easybuild +export EASYBUILD_INSTALLPATH=${WORKDIR} ``` -Note that you probably also want to add the path where the modules are installed to your `MODULEPATH`: +!!! Note + If you started the container using --resume, you probably want WORKDIR to point to the workdir you created previously, instead of create a new, temporary directory. + +!!! Note + If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC`. + +Next, add the path where the modules are installed to your `MODULEPATH`, so that you can easily find these with `module av` after installation has completed: ``` module use ${EASYBUILD_INSTALLPATH}/modules/all ``` @@ -164,9 +171,6 @@ trace (E) = True zip-logs (E) = bzip2 ``` -!!! Note - If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC`. - ## Building the software When the bot builds software, it loops over all EasyStack files that have been changed, and builds them using EasyBuild. However, a single PR may add multiple items to a single EasyStack file, and the issue you are trying to debug is probably in _one_ of them. Getting EasyBuild to build the full EasyStack file will create the most similar situation to what the bot does. However, you _may_ just want to build the individual software that has changed. Below, we describe both approaches. @@ -175,11 +179,11 @@ In our [example PR](https://github.com/EESSI/software-layer/pull/360), the EasyS ``` eb --easystack eessi-2023.06-eb-4.8.1-2021b.yml --robot ``` -After some time, this build fails whil trying to build `Plumed`, and we can access the build log to look for clues on why it failed. +After some time, this build fails while trying to build `Plumed`, and we can access the build log to look for clues on why it failed. ### Building an individual package -In our [example PR](https://github.com/EESSI/software-layer/pull/360), the individual package that was added to `eessi-2023.06-eb-4.8.1-2021b.yml` was `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`. We'll also have to mind any options that are listed in the EasyStack file for `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`, in this case the option `--from-pr 19000`. Thus, to build, we run: +In our [example PR](https://github.com/EESSI/software-layer/pull/360), the individual package that was added to `eessi-2023.06-eb-4.8.1-2021b.yml` was `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`. We'll also have to (re)use any options that are listed in the EasyStack file for `LAMMPS-23Jun2022-foss-2021b-kokkos.eb`, in this case the option `--from-pr 19000`. Thus, to build, we run: ``` eb LAMMPS-23Jun2022-foss-2021b-kokkos.eb --robot --from-pr 19000 ``` -After some time, this build fails whil trying to build `Plumed`, and we can access the build log to look for clues on why it failed. +After some time, this build fails while trying to build `Plumed`, and we can access the build log to look for clues on why it failed. From 54034e19a013df374d1e1d96790f5b6d51b7f633 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 25 Oct 2023 15:35:45 +0200 Subject: [PATCH 18/23] Changed doc structure according to Thomas' comments --- docs/contributing_sw/building_software.md | 76 ++++++++++++++++++ docs/contributing_sw/contribution_policy.md | 3 + .../{adding_software.md => opening_pr.md} | 77 +------------------ docs/contributing_sw/overview.md | 3 + mkdocs.yml | 10 ++- 5 files changed, 90 insertions(+), 79 deletions(-) create mode 100644 docs/contributing_sw/building_software.md create mode 100644 docs/contributing_sw/contribution_policy.md rename docs/contributing_sw/{adding_software.md => opening_pr.md} (52%) create mode 100644 docs/contributing_sw/overview.md diff --git a/docs/contributing_sw/building_software.md b/docs/contributing_sw/building_software.md new file mode 100644 index 000000000..c7c585b08 --- /dev/null +++ b/docs/contributing_sw/building_software.md @@ -0,0 +1,76 @@ +# Building software (maintainers) + +### Instructing the bot to build :hammer: { #bot_build } + +Once the pull request is open, you can instruct the [bot :robot:](../bot.md) to build the software by posting a comment. + +For more information, see the [building section in the bot documentation](../bot.md#building). + +!!! warning + Permission to trigger building of software must be granted to your GitHub account first! + + See [bot permissions](../bot.md#permissions) for more information. + +#### Guidelines + +* It may be wise to let the bot perform a test build first, rather than letting it build for a wide range + of CPU targets. + +* If one of the builds failed, you can let the bot retry that specific build. + +* Make sure that the software has been built correctly for all [CPU targets](../software_layer/cpu_targets.md) before you deploy! + +#### Checking the builds :mag: + +If all goes well, you should see `SUCCESS` :grin: for each build, along with button :arrow_down_small: +to get more information about the checks that were performed, and metadata information on the resulting +artefact :package:. + +!!! note + **Make sure the result is what you expect it to be for all builds before you deploy!** + +#### Failing builds :no_entry: + +!!! warning + The bot will currently not give you any information on how or why a build is failing. + + Ask for help in the `#software-layer` channel of the EESSI Slack if needed! + +### Instructing the bot to deploy :rocket: + +To make the [bot :robot:](../bot.md) deploy the successfully built software, you should +issue the corresponding instruction to the bot. + +For more information, see the [deploying section in the bot documentation](../bot.md#deploying). + +!!! warning + Permission to trigger deployment of software installations must be granted to your GitHub account first! + + See [bot permissions](../bot.md#permissions) for more information. + +### Merging the pull request + +You should be able to verify in the pull request that the ingestion has been done, +since the CI should fail :x: initially to indicate that some software installations listed in +your modified easystack are missing. + +Once the ingestion has been done, simply re-triggering the CI workflow should be sufficient to make it pass +:white_check_mark:, and then the pull request can be merged. + +!!! note + This assumes that the easystack file being modified is considered by the CI workflow file + (`.github/workflows/test_eessi.yml`) that checks for missing installations, in the correct branch (for example + `2023.06`) of the [software-layer](https://github.com/EESSI/software-layer). + + If that's not the case yet, update this workflow in your pull request as well to add the missing easystack file! + +!!! warning + You need permissions to re-trigger CI workflows and merge pull requests + in the [software-layer](https://github.com/EESSI/software-layer) repository. + + Ask for help in the `#software-layer` channel of the EESSI Slack if needed! + +### Getting help + +If you have any questions, or if you need help with something, don't hesitate to contact us via +the `#software-layer` channel of the EESSI Slack. diff --git a/docs/contributing_sw/contribution_policy.md b/docs/contributing_sw/contribution_policy.md new file mode 100644 index 000000000..b7c658fc5 --- /dev/null +++ b/docs/contributing_sw/contribution_policy.md @@ -0,0 +1,3 @@ +# Contribution policy + +(coming soon) diff --git a/docs/contributing_sw/adding_software.md b/docs/contributing_sw/opening_pr.md similarity index 52% rename from docs/contributing_sw/adding_software.md rename to docs/contributing_sw/opening_pr.md index 157c1075d..6c69f703e 100644 --- a/docs/contributing_sw/adding_software.md +++ b/docs/contributing_sw/opening_pr.md @@ -1,4 +1,4 @@ -# Adding software +# Opening a pull request (contributors) To add software to EESSI, you should go through the semi-automatic software installation procedure by: @@ -89,78 +89,3 @@ git push koala example_branch If all goes well, one or more bots :robot: should almost instantly create a comment in your pull request with an overview of how it is configured - you will need this information when providing build instructions. - -### Instructing the bot to build :hammer: { #bot_build } - -Once the pull request is open, you can instruct the [bot :robot:](../bot.md) to build the software by posting a comment. - -For more information, see the [building section in the bot documentation](../bot.md#building). - -!!! warning - Permission to trigger building of software must be granted to your GitHub account first! - - See [bot permissions](../bot.md#permissions) for more information. - -#### Guidelines - -* It may be wise to let the bot perform a test build first, rather than letting it build for a wide range - of CPU targets. - -* If one of the builds failed, you can let the bot retry that specific build. - -* Make sure that the software has been built correctly for all [CPU targets](../software_layer/cpu_targets.md) before you deploy! - -#### Checking the builds :mag: - -If all goes well, you should see `SUCCESS` :grin: for each build, along with button :arrow_down_small: -to get more information about the checks that were performed, and metadata information on the resulting -artefact :package:. - -!!! note - **Make sure the result is what you expect it to be for all builds before you deploy!** - -#### Failing builds :no_entry: - -!!! warning - The bot will currently not give you any information on how or why a build is failing. - - Ask for help in the `#software-layer` channel of the EESSI Slack if needed! - -### Instructing the bot to deploy :rocket: - -To make the [bot :robot:](../bot.md) deploy the successfully built software, you should -issue the corresponding instruction to the bot. - -For more information, see the [deploying section in the bot documentation](../bot.md#deploying). - -!!! warning - Permission to trigger deployment of software installations must be granted to your GitHub account first! - - See [bot permissions](../bot.md#permissions) for more information. - -### Merging the pull request - -You should be able to verify in the pull request that the ingestion has been done, -since the CI should fail :x: initially to indicate that some software installations listed in -your modified easystack are missing. - -Once the ingestion has been done, simply re-triggering the CI workflow should be sufficient to make it pass -:white_check_mark:, and then the pull request can be merged. - -!!! note - This assumes that the easystack file being modified is considered by the CI workflow file - (`.github/workflows/test_eessi.yml`) that checks for missing installations, in the correct branch (for example - `2023.06`) of the [software-layer](https://github.com/EESSI/software-layer). - - If that's not the case yet, update this workflow in your pull request as well to add the missing easystack file! - -!!! warning - You need permissions to re-trigger CI workflows and merge pull requests - in the [software-layer](https://github.com/EESSI/software-layer) repository. - - Ask for help in the `#software-layer` channel of the EESSI Slack if needed! - -### Getting help - -If you have any questions, or if you need help with something, don't hesitate to contact us via -the `#software-layer` channel of the EESSI Slack. diff --git a/docs/contributing_sw/overview.md b/docs/contributing_sw/overview.md new file mode 100644 index 000000000..aead25af5 --- /dev/null +++ b/docs/contributing_sw/overview.md @@ -0,0 +1,3 @@ +# Overview + +This page will display an overview of the procedure of contributing software to the EESSI software stack (coming soon). diff --git a/mkdocs.yml b/mkdocs.yml index 9c839c79d..26082f202 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -40,11 +40,15 @@ nav: - using_eessi/setting_up_environment.md - using_eessi/basic_commands.md - using_eessi/eessi_demos.md - - Contributing software to EESSI: + - Adding software to EESSI: # Todo: insert an overview page with a flowchart showing the high level process - # - Overview: contributing_sw/overview.md - - contributing_sw/adding_software.md + - contributing_sw/overview.md + # - Contribution policy, requires #108 + - contributing_sw/contribution_policy.md + - contributing_sw/opening_pr.md + - contributing_sw/building_software.md - contributing_sw/debugging_failed_builds.md + - contributing_sw/deploying_software.md # Todo: write on how to contribute to the EESSI test suite # - Contributing software tests to the EESSI test suite: - Getting support: support.md From cf854e0b3871e5007dbe85944edbf6396a145f62 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 25 Oct 2023 15:57:21 +0200 Subject: [PATCH 19/23] Fixed broken links due to moving stuff around --- docs/bot.md | 4 +-- docs/contributing_sw/deploying_software.md | 40 ++++++++++++++++++++++ docs/support.md | 2 +- 3 files changed, 43 insertions(+), 3 deletions(-) create mode 100644 docs/contributing_sw/deploying_software.md diff --git a/docs/bot.md b/docs/bot.md index c057e4f0d..4e6d40377 100644 --- a/docs/bot.md +++ b/docs/bot.md @@ -6,7 +6,7 @@ Building, testing, and deploying software is done by one or more *bot instances* The EESSI build-test-deploy bot :robot: is implemented as a [GitHub App](https://docs.github.com/en/apps/overview) in the [`eessi-bot-software-layer` repository](https://github.com/EESSI/eessi-bot-software-layer). -It operates in the context of [pull requests](contributing_sw/adding_software.md#software_layer_pull_request) to +It operates in the context of [pull requests](contributing_sw/opening_pr.md#software_layer_pull_request) to the [`compatibility-layer` repository](https://github.com/EESSI/compatibility-layer) or the [`software-layer` repository](https://github.com/EESSI/software-layer), and follows the instructions supplied by humans, @@ -61,7 +61,7 @@ to trigger building of software, and to deploy software installations in to the ## Building { #building } To instruct the bot :robot: to build software, one or more `build` instructions -should be issued by posting a comment in the pull request (see also [here](contributing_sw/adding_software.md#bot_build)). +should be issued by posting a comment in the pull request (see also [here](contributing_sw/building_software.md#bot_build)). The most basic build instruction that can be sent to the bot is: diff --git a/docs/contributing_sw/deploying_software.md b/docs/contributing_sw/deploying_software.md new file mode 100644 index 000000000..afd5f5639 --- /dev/null +++ b/docs/contributing_sw/deploying_software.md @@ -0,0 +1,40 @@ +# Deploying packages (maintainers) + +### Instructing the bot to deploy :rocket: + +To make the [bot :robot:](../bot.md) deploy the successfully built software, you should +issue the corresponding instruction to the bot. + +For more information, see the [deploying section in the bot documentation](../bot.md#deploying). + +!!! warning + Permission to trigger deployment of software installations must be granted to your GitHub account first! + + See [bot permissions](../bot.md#permissions) for more information. + +### Merging the pull request + +You should be able to verify in the pull request that the ingestion has been done, +since the CI should fail :x: initially to indicate that some software installations listed in +your modified easystack are missing. + +Once the ingestion has been done, simply re-triggering the CI workflow should be sufficient to make it pass +:white_check_mark:, and then the pull request can be merged. + +!!! note + This assumes that the easystack file being modified is considered by the CI workflow file + (`.github/workflows/test_eessi.yml`) that checks for missing installations, in the correct branch (for example + `2023.06`) of the [software-layer](https://github.com/EESSI/software-layer). + + If that's not the case yet, update this workflow in your pull request as well to add the missing easystack file! + +!!! warning + You need permissions to re-trigger CI workflows and merge pull requests + in the [software-layer](https://github.com/EESSI/software-layer) repository. + + Ask for help in the `#software-layer` channel of the EESSI Slack if needed! + +### Getting help + +If you have any questions, or if you need help with something, don't hesitate to contact us via +the `#software-layer` channel of the EESSI Slack. diff --git a/docs/support.md b/docs/support.md index 2faf44a8a..ce1a55181 100644 --- a/docs/support.md +++ b/docs/support.md @@ -38,7 +38,7 @@ Note that we can only help with problems related to the software *installations* We are open to software requests for software that is not included in EESSI yet. -The quickest way to add additional software to EESSI is by contributing it yourself as a community contribution, please see the [documentation on adding software](contributing_sw/adding_software.md). +The quickest way to add additional software to EESSI is by contributing it yourself as a community contribution, please see the [documentation on adding software](contributing_sw/overview.md). Alternatively, you can send in a request to our support team. Please try to provide as much information on the software as possible: preferably use the [issue template](https://gitlab.com/eessi/support/-/issues/new?issuable_template=Software_request) (which requires you to log in to GitLab), or make sure to cover the items listed [here](https://gitlab.com/eessi/support/-/blob/main/.gitlab/issue_templates/Software_request.md). From 4a80085134393b6d996ec5bf5d69aab1780a7fc6 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 26 Oct 2023 13:49:15 +0200 Subject: [PATCH 20/23] Indicate who these docs are for, just like the rest in this tree --- docs/contributing_sw/debugging_failed_builds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 4f7c631f9..3bfeedebc 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -1,4 +1,4 @@ -# Debugging failed builds +# Debugging failed builds (contributors/maintainers) Unfortunately, software does not always build successfully. Since EESSI targets novel CPU architectures as well, build failures on such platforms are quite common, as the software and/or the software build systems have not always been adjusted to support these architectures yet. From 06ac2ea9cca985668dc3b11e74ddc2c33f29c7e5 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 30 Oct 2023 14:51:20 +0100 Subject: [PATCH 21/23] Took two more of Thomas' comments into account --- docs/contributing_sw/debugging_failed_builds.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 3bfeedebc..12bdb45ac 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -60,7 +60,7 @@ logout Leaving Gentoo Prefix with exit status 1 Apptainer> exit exit -Saved contents of tmp directory '/tmp/eessi.VgLf1v9gf0' to tarball '${HOME}/pr370/EESSI-pilot-1698056784.tgz' (to resume session add '--resume ${HOME}/pr370/EESSI-pilot-1698056784.tgz') +Saved contents of tmp directory '/tmp/eessi-debug.VgLf1v9gf0' to tarball '${HOME}/pr370/EESSI-pilot-1698056784.tgz' (to resume session add '--resume ${HOME}/pr370/EESSI-pilot-1698056784.tgz') ``` Note that the tarballs can be quite sizeable, so make sure to pick a filesystem where you have a large enough quotum. @@ -122,7 +122,7 @@ It is important that we configure EasyBuild in the same way as the bot uses it, For both, any writeable path will do. In this example, we create a unique temporary directory inside `/tmp` to serve both as our workdir and installpath. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables. ``` -export WORKDIR=$(mktemp --directory --tmpdir=/tmp -t eessi.XXXXXXXXXX) +export WORKDIR=$(mktemp --directory --tmpdir=/tmp -t eessi-debug.XXXXXXXXXX) source configure_easybuild export EASYBUILD_INSTALLPATH=${WORKDIR} ``` @@ -187,3 +187,4 @@ In our [example PR](https://github.com/EESSI/software-layer/pull/360), the indiv eb LAMMPS-23Jun2022-foss-2021b-kokkos.eb --robot --from-pr 19000 ``` After some time, this build fails while trying to build `Plumed`, and we can access the build log to look for clues on why it failed. +!!! While this might be faster than the EasyStack-based approach, this is _not_ how the bot builds. So why it _may_ reproduce the failure the bot encounters, it may not reproduce the bug _at all_ (no failure) or run into _different_ bugs. If you want to be sure, use the EasyStack-based approach. From 24fafa6ee289098786b5f5f2dc19b64f8efcd85b Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 30 Oct 2023 17:14:59 +0100 Subject: [PATCH 22/23] Make instructions more similar to what the bot does by installing in the writeable overlay. Also, clarified instructions on how to replicate builds with 'generic' optimization --- .../debugging_failed_builds.md | 26 ++++++++++++------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 12bdb45ac..905f30413 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -41,7 +41,7 @@ git checkout LAMMPS_23Jun2022 ### Starting a shell in the EESSI container Simply run the EESSI container (`eessi_container.sh`), which should be in the root of the `software-layer` repository ``` -./eessi_container.sh +./eessi_container.sh --access rw ``` !!! Note You may have to press enter to clearly see the prompt as some messages @@ -51,7 +51,7 @@ Simply run the EESSI container (`eessi_container.sh`), which should be in the ro If you want to debug an issue for which a lot of dependencies need to be build first, you may want to start the container with the `--save DIR/TGZ` and flag (check `./eessi_container.sh --help`). This saves the temporary directory (which we will use as working and installation directory later in this instruction) in order to be able to resume later with the same temporary directory. E.g. ``` -./eessi_container.sh --save ${HOME}/pr370 +./eessi_container.sh --access rw --save ${HOME}/pr370 ``` The tarball will be saved when you exit the container. Note that the first `exit` command will first make you exit the Gentoo prefix environment. Only the second will take you out of the container, and print where the tarball will be stored: ``` @@ -68,7 +68,7 @@ Note that the tarballs can be quite sizeable, so make sure to pick a filesystem Next time you want to continue investigating this issue, you can start the container with `--resume DIR/TGZ` and continue where you left off, having all dependencies already built and available. ``` -./eessi_container.sh --resume ${HOME}/pr370/EESSI-pilot-1698056784.tgz +./eessi_container.sh --access rw --resume ${HOME}/pr370/EESSI-pilot-1698056784.tgz ``` For a detailed description on using the script `eessi_container.sh`, see [here](../getting_access/eessi_container.md). @@ -109,28 +109,34 @@ source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/bash !!! Note If you get an error `bash: /versions//init/bash: No such file or directory`, you forgot to reset the `${EESSI_CVFMS_REPO}` and `${EESSI_PILOT_VERSION}` environment variables at the end of the previous step. +!!! Note + If you want to build with generic optimization, you should run `export EESSI_CPU_FAMILY=$(uname -m) && export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${EESSI_CPU_FAMILY}/generic` before sourcing. For more info on starting the EESSI software environment, see [here](../using_eessi/setting_up_environment.md) ### Configure EasyBuild -It is important that we configure EasyBuild in the same way as the bot uses it, with two small exceptions: - -- Our working directory will be different -- Our installpath will be different +It is important that we configure EasyBuild in the same way as the bot uses it, with one small exceptions: our working directory will be different. Typically, that doesn't matter, but it's good to be aware of this one difference, in case you fail to replicate the build failure. -For both, any writeable path will do. In this example, we create a unique temporary directory inside `/tmp` to serve both as our workdir and installpath. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables. +In this example, we create a unique temporary directory inside `/tmp` to serve both as our workdir. Finally, we will source the `configure_easybuild` script, which will configure EasyBuild by setting environment variables. ``` export WORKDIR=$(mktemp --directory --tmpdir=/tmp -t eessi-debug.XXXXXXXXXX) source configure_easybuild +``` +Among other things, the `configure_easybuild` script sets the install path for EasyBuild to point to the correct installation directory in (to `${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR}`). This is the exact same path the `bot` uses to build, and uses a writeable overlay filesystem in the container to write to a path in `/cvmfs` (which normally is read-only). Since this is identical to what the `bot` does, we advise you to start with this when reproducting a build failure. However, after having reproduced the bug, you may want to set a different `EASYBUILD_INSTALLPATH`, e.g. + +``` export EASYBUILD_INSTALLPATH=${WORKDIR} ``` + +(_after_ sourcing the `configure_easybuild` script, so that you overwrite whatever that script sets). This can help you identify if an issue is related to building in a writeable overlay. For example, the writeable overlay is know to be a bit slow sometimes, and we have seen tests failing because they exceeded some timeout. + !!! Note - If you started the container using --resume, you probably want WORKDIR to point to the workdir you created previously, instead of create a new, temporary directory. + If you started the container using --resume, yoy may want WORKDIR to point to the workdir you created previously (instead of create a new, temporary directory with `mktemp`). !!! Note - If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC`. + If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC` after sourcing `configure_easybuild`. Next, add the path where the modules are installed to your `MODULEPATH`, so that you can easily find these with `module av` after installation has completed: ``` From e4978434e69f8e4682448b273ed26726f023526e Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Wed, 1 Nov 2023 14:35:26 +0100 Subject: [PATCH 23/23] Changed based on Tim's review. Various typos. Moved the suggestion for building in /tmp to figure out if the writeable overlay is the issue to _after_ the regular build instructions. This is now in a new section that discusses known causes for issues in EESSI. I also added the non-standard sysroot here. These two will probably cover at least some of the issues that we see at this stage - most other issues are already caught when a contribution is made to EasyBuild itself. --- .../debugging_failed_builds.md | 58 ++++++++++++++----- 1 file changed, 44 insertions(+), 14 deletions(-) diff --git a/docs/contributing_sw/debugging_failed_builds.md b/docs/contributing_sw/debugging_failed_builds.md index 905f30413..ff2cdbba3 100644 --- a/docs/contributing_sw/debugging_failed_builds.md +++ b/docs/contributing_sw/debugging_failed_builds.md @@ -89,7 +89,7 @@ export EESSI_CPU_FAMILY=$(uname -m) ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/startprefix ``` -Now, reset the `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` in your prefix environment +Now, reset the `${EESSI_CVMFS_REPO}` and `${EESSI_PILOT_VERSION}` in your prefix environment with the initial values (printed in the echo statements above) ``` export EESSI_CVMFS_REPO=... export EESSI_PILOT_VERSION=... @@ -109,6 +109,7 @@ source ${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/init/bash !!! Note If you get an error `bash: /versions//init/bash: No such file or directory`, you forgot to reset the `${EESSI_CVFMS_REPO}` and `${EESSI_PILOT_VERSION}` environment variables at the end of the previous step. + !!! Note If you want to build with generic optimization, you should run `export EESSI_CPU_FAMILY=$(uname -m) && export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${EESSI_CPU_FAMILY}/generic` before sourcing. @@ -124,24 +125,14 @@ In this example, we create a unique temporary directory inside `/tmp` to serve b export WORKDIR=$(mktemp --directory --tmpdir=/tmp -t eessi-debug.XXXXXXXXXX) source configure_easybuild ``` -Among other things, the `configure_easybuild` script sets the install path for EasyBuild to point to the correct installation directory in (to `${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR}`). This is the exact same path the `bot` uses to build, and uses a writeable overlay filesystem in the container to write to a path in `/cvmfs` (which normally is read-only). Since this is identical to what the `bot` does, we advise you to start with this when reproducting a build failure. However, after having reproduced the bug, you may want to set a different `EASYBUILD_INSTALLPATH`, e.g. - -``` -export EASYBUILD_INSTALLPATH=${WORKDIR} -``` - -(_after_ sourcing the `configure_easybuild` script, so that you overwrite whatever that script sets). This can help you identify if an issue is related to building in a writeable overlay. For example, the writeable overlay is know to be a bit slow sometimes, and we have seen tests failing because they exceeded some timeout. +Among other things, the `configure_easybuild` script sets the install path for EasyBuild to point to the correct installation directory in (to `${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR}`). This is the exact same path the `bot` uses to build, and uses a writeable overlay filesystem in the container to write to a path in `/cvmfs` (which normally is read-only). This is identical to what the `bot` does. !!! Note - If you started the container using --resume, yoy may want WORKDIR to point to the workdir you created previously (instead of create a new, temporary directory with `mktemp`). + If you started the container using --resume, you may want WORKDIR to point to the workdir you created previously (instead of creating a new, temporary directory with `mktemp`). !!! Note If you want to replicate a build with `generic` optimization (i.e. in `$EESSI_CVMFS_REPO/versions/${EESSI_PILOT_VERSION}/software/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/generic`) you will need to set `export EASYBUILD_OPTARCH=GENERIC` after sourcing `configure_easybuild`. -Next, add the path where the modules are installed to your `MODULEPATH`, so that you can easily find these with `module av` after installation has completed: -``` -module use ${EASYBUILD_INSTALLPATH}/modules/all -``` Next, we need to determine the correct version of EasyBuild to load. Since [the example PR](https://github.com/EESSI/software-layer/pull/360) changes the file `eessi-2023.06-eb-4.8.1-2021b.yml`, this tells us the bot was using version `4.8.1` of EasyBuild to build this. Thus, we load that version of the EasyBuild module and check if everything was configured correctly: ``` @@ -193,4 +184,43 @@ In our [example PR](https://github.com/EESSI/software-layer/pull/360), the indiv eb LAMMPS-23Jun2022-foss-2021b-kokkos.eb --robot --from-pr 19000 ``` After some time, this build fails while trying to build `Plumed`, and we can access the build log to look for clues on why it failed. -!!! While this might be faster than the EasyStack-based approach, this is _not_ how the bot builds. So why it _may_ reproduce the failure the bot encounters, it may not reproduce the bug _at all_ (no failure) or run into _different_ bugs. If you want to be sure, use the EasyStack-based approach. + +!!! Note + While this might be faster than the EasyStack-based approach, this is _not_ how the bot builds. So why it _may_ reproduce the failure the bot encounters, it may not reproduce the bug _at all_ (no failure) or run into _different_ bugs. If you want to be sure, use the EasyStack-based approach. + +## Known causes of issues in EESSI + +### The custom system prefix of the compatibility layer +Some installations might expect the system root (sysroot, for short) to be in `/`. However, in case of EESSI, we are building against the OS in the [compatibility layer](../compatibility_layer.md). Thus, our sysroot is something like `${EESSI_CVMFS_REPO}/versions/${EESSI_PILOT_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}`. This _can_ cause issues if installation procedures _assume_ the sysroot is in `/`. + +One example of a sysroot [issue](https://github.com/EESSI/software-layer/pull/370#issuecomment-1774744151) was in installing `wget`. The EasyConfig for `wget` defined +``` +# make sure pkg-config picks up system packages (OpenSSL & co) +preconfigopts = "export PKG_CONFIG_PATH=/usr/lib64/pkgconfig:/usr/lib/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig && " +configopts = '--with-ssl=openssl ' +``` +This will not work in EESSI, since the OpenSSL should be picked up from the compatibility layer. This was fixed by changing the EasyConfig to read +``` +preconfigopts = "export PKG_CONFIG_PATH=%(sysroot)s/usr/lib64/pkgconfig:%(sysroot)s/usr/lib/pkgconfig:%(sysroot)s/usr/lib/x86_64-linux-gnu/pkgconfig && " +configopts = '--with-ssl=openssl +``` +The `%(sysroot)s` is a template value which EasyBuild will resolve to the value that has been configured in EasyBuild for `sysroot` (it is one of the fields printed by `eb --show-config` if a non-standard sysroot is configured). + +If you encounter issues where the installation can not find something that is _normally_ provided by the OS (i.e. _not_ one of the dependencies in your module environment), you may need to resort to a similar approach. + +### The writeable overlay +The writeable overlay in the container is known to be a bit slow sometimes. Thus, we have seen tests failing because they exceed some timeout (e.g. [this issue](https://github.com/EESSI/software-layer/pull/332#issuecomment-1775374260)). + +To investigate if the writeable overlay is somehow the issue, you can make sure the installation gets done somewhere else, e.g. in the temporary directory in `/tmp` that you created as workdir. To do this, set + +``` +export EASYBUILD_INSTALLPATH=${WORKDIR} +``` + +_after_ the step in which you have sourced the `configure_easybuild` script. Note that in order to find (with `module av`) any modules that get installed here, you will need to add this path to the `MODULEPATH`: + +``` +module use ${EASYBUILD_INSTALLPATH}/modules/all +``` + +Then, retry building the software (as described above). If the build now succeeds, you know that indeed the writeable overlay caused the issue. We _have_ to build in this writeable overlay when we do real deployments. Thus, if you hit such a timeout, try to see if you can (temporarily) modify the timeout value in the test so that it passes.