Skip to content

Commit

Permalink
Deployed c7e6e4a with MkDocs version: 1.6.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Jun 4, 2024
1 parent c8bc00d commit 434c966
Show file tree
Hide file tree
Showing 3 changed files with 89 additions and 18 deletions.
66 changes: 51 additions & 15 deletions adding_software/debugging_failed_builds/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1346,6 +1346,15 @@
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#rebuilding-software" class="md-nav__link">
<span class="md-ellipsis">
Rebuilding software
</span>
</a>

</li>

<li class="md-nav__item">
Expand Down Expand Up @@ -1947,6 +1956,15 @@
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#rebuilding-software" class="md-nav__link">
<span class="md-ellipsis">
Rebuilding software
</span>
</a>

</li>

<li class="md-nav__item">
Expand Down Expand Up @@ -2197,26 +2215,44 @@ <h2 id="building-an-individual-package">Building an individual package<a class="
<p class="admonition-title">Note</p>
<p>While this might be faster than the easystack-based approach, this is <em>not</em> how the bot builds. So why it <em>may</em> reproduce the failure the bot encounters, it may not reproduce the bug <em>at all</em> (no failure) or run into <em>different</em> bugs. If you want to be sure, use the easystack-based approach.</p>
</div>
<h2 id="rebuilding-software">Rebuilding software<a class="headerlink" href="#rebuilding-software" title="Permanent link">&para;</a></h2>
<p><a href="../opening_pr/#rebuilding_software">Rebuilding software</a> requires an additional step at the beginning: the software first needs to be removed. We assume you've already <a href="#fetching-the-feature-branch">checked out the feature branch</a>. Then, you need to start the container with the additional <code>--fakeroot</code> argument, otherwise you will not be able to remove files from the <code>/cvmfs</code> prefix. Make sure to also include the <code>--save</code> argument, as we will need the tarball later on. E.g.
<div class="highlight"><pre><span></span><code><a id="__codelineno-20-1" name="__codelineno-20-1" href="#__codelineno-20-1"></a>SINGULARITY_CACHEDIR=${eessi_common_dir}/container_cache ./eessi_container.sh --access rw --nvidia all --host-injections ${eessi_common_dir}/host_injections --save ${eessi_pr_dir} --fakeroot
</code></pre></div>
Then, initialize the EESSI environment
<div class="highlight"><pre><span></span><code><a id="__codelineno-21-1" name="__codelineno-21-1" href="#__codelineno-21-1"></a>source ${EESSI_CVMFS_REPO}/versions/${EESSI_VERSION}/init/bash
</code></pre></div>
and get the diff file for the corresponding PR, e.g. for PR 123:
<div class="highlight"><pre><span></span><code><a id="__codelineno-22-1" name="__codelineno-22-1" href="#__codelineno-22-1"></a>wget https://github.com/EESSI/software-layer/pull/123.diff
</code></pre></div>
Finally, run the <code>EESSI-remove-software.sh</code> script
<div class="highlight"><pre><span></span><code><a id="__codelineno-23-1" name="__codelineno-23-1" href="#__codelineno-23-1"></a>./EESSI-remove-software.sh`
</code></pre></div></p>
<p>This should remove any software specified in a <a href="../opening_pr/#rebuilding_software">rebuild easystack</a> that got added in your current feature branch.</p>
<p>Now, exit the container, paying attention to the instructions that are printed to resume later, e.g.:</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-24-1" name="__codelineno-24-1" href="#__codelineno-24-1"></a>Saved contents of tmp directory &#39;/tmp/eessi.WZxeFUemH2&#39; to tarball &#39;/home/myuser/pr507/EESSI-1711538681.tgz&#39; (to resume session add &#39;--resume /home/myuser/pr507/EESSI-1711538681.tgz&#39;)
</code></pre></div>
<p>Now, continue with the original instructions to start the container (i.e. either <a href="#starting-a-shell-in-the-eessi-container">here</a> or <a href="#more-efficient-approach-for-multiplecontinued-debugging-sessions">with this alternate approach</a>) and make sure to add the <code>--resume</code> flag. This way, you are resuming from the tarball (i.e. with the software removed that has to be rebuilt), but in a new container in which you have regular (i.e. no root) permissions.</p>
<h2 id="running-the-test-step">Running the test step<a class="headerlink" href="#running-the-test-step" title="Permanent link">&para;</a></h2>
<p>If you are still in the prefix layer (i.e. after previously building something), exit it first:
<div class="highlight"><pre><span></span><code><a id="__codelineno-20-1" name="__codelineno-20-1" href="#__codelineno-20-1"></a>$ exit
<a id="__codelineno-20-2" name="__codelineno-20-2" href="#__codelineno-20-2"></a>logout
<a id="__codelineno-20-3" name="__codelineno-20-3" href="#__codelineno-20-3"></a>Leaving Gentoo Prefix with exit status 0
<div class="highlight"><pre><span></span><code><a id="__codelineno-25-1" name="__codelineno-25-1" href="#__codelineno-25-1"></a>$ exit
<a id="__codelineno-25-2" name="__codelineno-25-2" href="#__codelineno-25-2"></a>logout
<a id="__codelineno-25-3" name="__codelineno-25-3" href="#__codelineno-25-3"></a>Leaving Gentoo Prefix with exit status 0
</code></pre></div>
Then, source the EESSI init script (again):
<div class="highlight"><pre><span></span><code><a id="__codelineno-21-1" name="__codelineno-21-1" href="#__codelineno-21-1"></a>Apptainer&gt; source ${EESSI_CVMFS_REPO}/versions/${EESSI_VERSION}/init/bash
<a id="__codelineno-21-2" name="__codelineno-21-2" href="#__codelineno-21-2"></a>Environment set up to use EESSI (2023.06), have fun!
<a id="__codelineno-21-3" name="__codelineno-21-3" href="#__codelineno-21-3"></a>{EESSI 2023.06} Apptainer&gt;
<div class="highlight"><pre><span></span><code><a id="__codelineno-26-1" name="__codelineno-26-1" href="#__codelineno-26-1"></a>Apptainer&gt; source ${EESSI_CVMFS_REPO}/versions/${EESSI_VERSION}/init/bash
<a id="__codelineno-26-2" name="__codelineno-26-2" href="#__codelineno-26-2"></a>Environment set up to use EESSI (2023.06), have fun!
<a id="__codelineno-26-3" name="__codelineno-26-3" href="#__codelineno-26-3"></a>{EESSI 2023.06} Apptainer&gt;
</code></pre></div></p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>If you are in a SLURM environment, make sure to run <code>for i in $(env | grep SLURM); do unset "${i%=*}"; done</code> to unset any SLURM environment variables. Failing to do so will cause <code>mpirun</code> to pick up on these and e.g. infer how many slots are available. If you run into errors of the form "There are not enough slots available in the system to satisfy the X slots that were requested by the application:", you probably forgot this step.</p>
</div>
<p>Then, execute the <code>run_tests.sh</code> script. We are assuming you are still in the root of the <code>software-layer</code> repository that you cloned earlier:
<div class="highlight"><pre><span></span><code><a id="__codelineno-22-1" name="__codelineno-22-1" href="#__codelineno-22-1"></a>./run_tests.sh
<div class="highlight"><pre><span></span><code><a id="__codelineno-27-1" name="__codelineno-27-1" href="#__codelineno-27-1"></a>./run_tests.sh
</code></pre></div>
if all goes well, you should see (part of) the EESSI test suite being run by ReFrame, finishing with something like</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-23-1" name="__codelineno-23-1" href="#__codelineno-23-1"></a>[ PASSED ] Ran X/Y test case(s) from Z check(s) (0 failure(s), 0 skipped, 0 aborted)
<div class="highlight"><pre><span></span><code><a id="__codelineno-28-1" name="__codelineno-28-1" href="#__codelineno-28-1"></a>[ PASSED ] Ran X/Y test case(s) from Z check(s) (0 failure(s), 0 skipped, 0 aborted)
</code></pre></div>
<div class="admonition note">
<p class="admonition-title">Note</p>
Expand All @@ -2226,23 +2262,23 @@ <h2 id="known-causes-of-issues-in-eessi">Known causes of issues in EESSI<a class
<h3 id="the-custom-system-prefix-of-the-compatibility-layer">The custom system prefix of the compatibility layer<a class="headerlink" href="#the-custom-system-prefix-of-the-compatibility-layer" title="Permanent link">&para;</a></h3>
<p>Some installations might expect the system root (sysroot, for short) to be in <code>/</code>. However, in case of EESSI, we are building against the OS in the <a href="../../compatibility_layer/">compatibility layer</a>. Thus, our sysroot is something like <code>${EESSI_CVMFS_REPO}/versions/${EESSI_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}</code>. This <em>can</em> cause issues if installation procedures <em>assume</em> the sysroot is in <code>/</code>.</p>
<p>One example of a sysroot <a href="https://github.com/EESSI/software-layer/pull/370#issuecomment-1774744151">issue</a> was in installing <code>wget</code>. The EasyConfig for <code>wget</code> defined
<div class="highlight"><pre><span></span><code><a id="__codelineno-24-1" name="__codelineno-24-1" href="#__codelineno-24-1"></a># make sure pkg-config picks up system packages (OpenSSL &amp; co)
<a id="__codelineno-24-2" name="__codelineno-24-2" href="#__codelineno-24-2"></a>preconfigopts = &quot;export PKG_CONFIG_PATH=/usr/lib64/pkgconfig:/usr/lib/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig &amp;&amp; &quot;
<a id="__codelineno-24-3" name="__codelineno-24-3" href="#__codelineno-24-3"></a>configopts = &#39;--with-ssl=openssl &#39;
<div class="highlight"><pre><span></span><code><a id="__codelineno-29-1" name="__codelineno-29-1" href="#__codelineno-29-1"></a># make sure pkg-config picks up system packages (OpenSSL &amp; co)
<a id="__codelineno-29-2" name="__codelineno-29-2" href="#__codelineno-29-2"></a>preconfigopts = &quot;export PKG_CONFIG_PATH=/usr/lib64/pkgconfig:/usr/lib/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig &amp;&amp; &quot;
<a id="__codelineno-29-3" name="__codelineno-29-3" href="#__codelineno-29-3"></a>configopts = &#39;--with-ssl=openssl &#39;
</code></pre></div>
This will not work in EESSI, since the OpenSSL should be picked up from the compatibility layer. This was fixed by changing the EasyConfig to read
<div class="highlight"><pre><span></span><code><a id="__codelineno-25-1" name="__codelineno-25-1" href="#__codelineno-25-1"></a>preconfigopts = &quot;export PKG_CONFIG_PATH=%(sysroot)s/usr/lib64/pkgconfig:%(sysroot)s/usr/lib/pkgconfig:%(sysroot)s/usr/lib/x86_64-linux-gnu/pkgconfig &amp;&amp; &quot;
<a id="__codelineno-25-2" name="__codelineno-25-2" href="#__codelineno-25-2"></a>configopts = &#39;--with-ssl=openssl
<div class="highlight"><pre><span></span><code><a id="__codelineno-30-1" name="__codelineno-30-1" href="#__codelineno-30-1"></a>preconfigopts = &quot;export PKG_CONFIG_PATH=%(sysroot)s/usr/lib64/pkgconfig:%(sysroot)s/usr/lib/pkgconfig:%(sysroot)s/usr/lib/x86_64-linux-gnu/pkgconfig &amp;&amp; &quot;
<a id="__codelineno-30-2" name="__codelineno-30-2" href="#__codelineno-30-2"></a>configopts = &#39;--with-ssl=openssl
</code></pre></div>
The <code>%(sysroot)s</code> is a template value which EasyBuild will resolve to the value that has been configured in EasyBuild for <code>sysroot</code> (it is one of the fields printed by <code>eb --show-config</code> if a non-standard sysroot is configured).</p>
<p>If you encounter issues where the installation can not find something that is <em>normally</em> provided by the OS (i.e. <em>not</em> one of the dependencies in your module environment), you may need to resort to a similar approach.</p>
<h3 id="the-writeable-overlay">The writeable overlay<a class="headerlink" href="#the-writeable-overlay" title="Permanent link">&para;</a></h3>
<p>The writeable overlay in the container is known to be a bit slow sometimes. Thus, we have seen tests failing because they exceed some timeout (e.g. <a href="https://github.com/EESSI/software-layer/pull/332#issuecomment-1775374260">this issue</a>).</p>
<p>To investigate if the writeable overlay is somehow the issue, you can make sure the installation gets done somewhere else, e.g. in the temporary directory in <code>/tmp</code> that you created as workdir. To do this, set</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-26-1" name="__codelineno-26-1" href="#__codelineno-26-1"></a>export EASYBUILD_INSTALLPATH=${WORKDIR}
<div class="highlight"><pre><span></span><code><a id="__codelineno-31-1" name="__codelineno-31-1" href="#__codelineno-31-1"></a>export EASYBUILD_INSTALLPATH=${WORKDIR}
</code></pre></div>
<p><em>after</em> the step in which you have sourced the <code>configure_easybuild</code> script. Note that in order to find (with <code>module av</code>) any modules that get installed here, you will need to add this path to the <code>MODULEPATH</code>:</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-27-1" name="__codelineno-27-1" href="#__codelineno-27-1"></a>module use ${EASYBUILD_INSTALLPATH}/modules/all
<div class="highlight"><pre><span></span><code><a id="__codelineno-32-1" name="__codelineno-32-1" href="#__codelineno-32-1"></a>module use ${EASYBUILD_INSTALLPATH}/modules/all
</code></pre></div>
<p>Then, retry building the software (as described above). If the build now succeeds, you know that indeed the writeable overlay caused the issue. We <em>have</em> to build in this writeable overlay when we do real deployments. Thus, if you hit such a timeout, try to see if you can (temporarily) modify the timeout value in the test so that it passes.</p>

Expand Down
39 changes: 37 additions & 2 deletions adding_software/opening_pr/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1214,6 +1214,15 @@
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#rebuilding-software" class="md-nav__link">
<span class="md-ellipsis">
Rebuilding software
</span>
</a>

</li>

</ul>
Expand Down Expand Up @@ -1683,6 +1692,15 @@
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#rebuilding-software" class="md-nav__link">
<span class="md-ellipsis">
Rebuilding software
</span>
</a>

</li>

</ul>
Expand Down Expand Up @@ -1755,8 +1773,9 @@ <h3 id="software_layer_pull_request">Creating a pull request<a class="headerlink
</code></pre></div>
<p>3) Determine the correct easystack file to change, and add one or more lines to it that specify which
easyconfigs should be installed</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-5-1" name="__codelineno-5-1" href="#__codelineno-5-1"></a><span class="nb">echo</span><span class="w"> </span><span class="s1">&#39; - example-1.2.3-GCC-12.3.0.eb&#39;</span><span class="w"> </span>&gt;&gt;<span class="w"> </span>easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-4.8.2-2023a.yml
<p><div class="highlight"><pre><span></span><code><a id="__codelineno-5-1" name="__codelineno-5-1" href="#__codelineno-5-1"></a><span class="nb">echo</span><span class="w"> </span><span class="s1">&#39; - example-1.2.3-GCC-12.3.0.eb&#39;</span><span class="w"> </span>&gt;&gt;<span class="w"> </span>easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-4.8.2-2023a.yml
</code></pre></div>
Note that the naming scheme is standardized and should be <code>eessi-&lt;eessi_version&gt;-eb-&lt;eb_version&gt;-&lt;toolchain_version&gt;.yml</code>. See the <a href="https://docs.easybuild.io/easystack-files/">official EasyBuild documentation on easystack files</a> for more information on the syntax.</p>
<p>4) Stage and commit the changes into your your branch with a sensible message</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-6-1" name="__codelineno-6-1" href="#__codelineno-6-1"></a>git<span class="w"> </span>add<span class="w"> </span>easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-4.8.2-2023a.yml
<a id="__codelineno-6-2" name="__codelineno-6-2" href="#__codelineno-6-2"></a>git<span class="w"> </span>commit<span class="w"> </span>-m<span class="w"> </span><span class="s2">&quot;{2023.06}[GCC/12.3.0] example 1.2.3&quot;</span>
Expand All @@ -1770,6 +1789,22 @@ <h3 id="software_layer_pull_request">Creating a pull request<a class="headerlink
software to (like <code>2023.06-software.eessi.io</code>).</p>
<p>If all goes well, one or more bots <img alt="🤖" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/1f916.svg" title=":robot:" /> should almost instantly create a comment in your pull request
with an overview of how it is configured - you will need this information when providing build instructions.</p>
<h3 id="rebuilding-software">Rebuilding software<a class="headerlink" href="#rebuilding-software" title="Permanent link">&para;</a></h3>
<p>We typically do not rebuild software, since (strictly speaking) this breaks reproducibility for anyone using the software. However, there are certain situations in which it is difficult or impossible to avoid.</p>
<p>To do a rebuild, you add the software you want to rebuild to a dedicated easystack file in the <code>rebuilds</code> directory. Use the following naming convention: <code>YYYYMMDD-eb-&lt;EB_VERSION&gt;-&lt;APPLICATION_NAME&gt;-&lt;APPLICATION_VERSION&gt;-&lt;SHORT_DESCRIPTION&gt;.yml</code>, where <code>YYYYMMDD</code> is the opening date of your PR. E.g. <code>2024.05.06-eb-4.9.1-CUDA-12.1.1-ship-full-runtime.yml</code> was added in a PR on the 6th of May 2024 and used to rebuild CUDA-12.1.1 using EasyBuild 4.9.1 to resolve an issue with some runtime libraries missing from the initial CUDA 12.1.1 installation.</p>
<p>At the top of your easystack file, please use comments to include a short description, and make sure to include any relevant links to related issues (e.g. from the GitHub repositories of EESSI, EasyBuild, or the software you are rebuilding).</p>
<p>As an example, consider the full easystack file (<code>2024.05.06-eb-4.9.1-CUDA-12.1.1-ship-full-runtime.yml</code>) used for the aforementioned CUDA rebuild: </p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-8-1" name="__codelineno-8-1" href="#__codelineno-8-1"></a><span class="c1"># 2024.05.06</span>
<a id="__codelineno-8-2" name="__codelineno-8-2" href="#__codelineno-8-2"></a><span class="c1"># Original matching of files we could ship was not done correctly. We were</span>
<a id="__codelineno-8-3" name="__codelineno-8-3" href="#__codelineno-8-3"></a><span class="c1"># matching the basename for files (e.g., libcudart.so from libcudart.so.12)</span>
<a id="__codelineno-8-4" name="__codelineno-8-4" href="#__codelineno-8-4"></a><span class="c1"># rather than the name stub (libcudart)</span>
<a id="__codelineno-8-5" name="__codelineno-8-5" href="#__codelineno-8-5"></a><span class="c1"># See https://github.com/EESSI/software-layer/pull/559</span>
<a id="__codelineno-8-6" name="__codelineno-8-6" href="#__codelineno-8-6"></a><span class="nt">easyconfigs</span><span class="p">:</span>
<a id="__codelineno-8-7" name="__codelineno-8-7" href="#__codelineno-8-7"></a><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">CUDA-12.1.1.eb</span><span class="p">:</span>
<a id="__codelineno-8-8" name="__codelineno-8-8" href="#__codelineno-8-8"></a><span class="w"> </span><span class="nt">options</span><span class="p">:</span>
<a id="__codelineno-8-9" name="__codelineno-8-9" href="#__codelineno-8-9"></a><span class="w"> </span><span class="nt">accept-eula-for</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">CUDA</span>
</code></pre></div>
<p>By separating rebuilds in dedicated files, we still maintain a complete software bill of materials: it is transparent what got rebuilt, for which reason, and when.</p>



Expand All @@ -1790,7 +1825,7 @@ <h3 id="software_layer_pull_request">Creating a pull request<a class="headerlink
<span class="md-icon" title="Last update">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1-2.1-2M12.5 7v5.2l4 2.4-1 1L11 13V7h1.5M11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2v1.8Z"/></svg>
</span>
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">December 6, 2023</span>
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">June 3, 2024</span>
</span>


Expand Down
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

0 comments on commit 434c966

Please sign in to comment.