Skip to content

Commit

Permalink
Merge pull request #2194 from cgoveas/devel-1.4.3.1
Browse files Browse the repository at this point in the history
Updating docs
  • Loading branch information
sujit-jadhav authored Oct 17, 2023
2 parents f88631b + 86f1ab5 commit ff02096
Show file tree
Hide file tree
Showing 32 changed files with 678 additions and 235 deletions.
2 changes: 1 addition & 1 deletion .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -626,7 +626,7 @@
],
"contributorsPerLine": 7,
"projectName": "omnia",
"projectOwner": "dellhpc",
"projectOwner": "dell",
"repoType": "github",
"repoHost": "https://github.com",
"skipCi": true,
Expand Down
10 changes: 5 additions & 5 deletions docs/source/.readthedocs.yaml → .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ build:
# golang: "1.19"

# Build documentation in the docs/ directory with Sphinx
#sphinx:
# configuration: conf.py
sphinx:
configuration: docs/source/conf.py

# If using Sphinx, optionally build your docs in additional formats such as PDF
formats:
- pdf

# Optionally declare the Python requirements required to build your docs
#python:
# install:
# - requirements: docs/requirements.txt
python:
install:
- requirements: docs/source/requirements.txt
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Contributions to Omnia are made through [Pull Requests (PRs)](https://help.githu
6. **Create a pull request:** [Create a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) with a title following this format Issue ###: Description (_i.e., Issue 1023: Reformat testutils_). It is important that you do a good job with the description to make the job of the code reviewer easier. A good description not only reduces review time, but also reduces the probability of a misunderstanding with the pull request.
* **Important:** When preparing a pull request it is important to stay up-to-date with the project repository. We recommend that you rebase against the upstream repo _frequently_. To do this, use the following commands:
```
git pull --rebase upstream devel #upstream is dellhpc/omnia
git pull --rebase upstream devel #upstream is dell/omnia
git push --force origin <pr-branch-name> #origin is your fork of the repository (e.g., <github_user_name>/omnia.git)
```
* **PR Description:** Be sure to fully describe the pull request. Ideally, your PR description will contain:
Expand Down
112 changes: 56 additions & 56 deletions README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/README.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Omnia Documentation
-------------------

**Omnia** is an open source project hosted on `GitHub <https://github.com/dellhpc/omnia>`_. Go to `GitHub <https://github.com/dellhpc/omnia>`_ to view the source, open issues, ask questions, and participate in the project.
**Omnia** is an open source project hosted on `GitHub <https://github.com/dell/omnia>`_. Go to `GitHub <https://github.com/dell/omnia>`_ to view the source, open issues, ask questions, and participate in the project.

The Omnia docs are hosted here: https://omnia-doc.readthedocs.io/en/latest/index.html and are written in reStructuredText (`.rst`).

Expand Down
Binary file modified docs/Security/Security Configuration Guide.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/source/Contributing/pullrequests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Make sure you have your user name and e-mail set. The ``--signoff | -s`` option
.. caution::
When preparing a pull request it is important to stay up-to-date with the project repository. We recommend that you rebase against the upstream repo frequently. ::

git pull --rebase upstream devel #upstream is dellhpc/omnia
git pull --rebase upstream devel #upstream is dell/omnia
git push --force origin <pr-branch-name> #origin is your fork of the repository (e.g., <github_user_name>/omnia.git)

PR description
Expand Down
62 changes: 62 additions & 0 deletions docs/source/InstallationGuides/Benchmarks/AutomatingOneAPI.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Automate installation oneAPI on Intel processors for MPI jobs
------------------------------------------------------------------

This topic explains how to automatically update servers for MPI jobs. To manually install oneAPI, `click here. <OneAPI.html>`_

**Pre-requisites**

* ``provision.yml`` has been executed.
* An Omnia **slurm** cluster has been set up by ``omnia.yml`` running with at least 2 nodes: 1 manager and 1 compute.
* Verify that the target nodes are in the ``booted`` state. For more information, `click here <../InstallingProvisionTool/ViewingDB.html>`_.

**To run the playbook**::


cd benchmarks
ansible-playbook intel_benchmark.yml -i inventory


**To execute multi-node jobs**

* Make sure to have NFS shares on each node.
* Copy slurm script to NFS share and execute it from there.
* Load all the necessary modules using module load: ::

module load mpi
module load pmi/pmix-x86_64
module load mkl

* If the commands/batch script are to be run over TCP instead of Infiniband ports, include the below line: ::

export FI_PROVIDER=tcp


Job execution can now be initiated.

.. note:: Ensure ``runme_intel64_dynamic`` is downloaded before running this command.

::

srun -N 2 /mnt/nfs_shares/appshare/mkl/2023.0.0/benchmarks/mp_linpack/runme_intel64_dynamic


For a batch job using the same parameters, the script would be: ::


#!/bin/bash
#SBATCH --job-name=testMPI
#SBATCH --output=output.txt
#SBATCH --partition=normal
#SBATCH --nodelist=node00004.omnia.test,node00005.omnia.test

pwd; hostname; date
export FI_PROVIDER=tcp
module load pmi/pmix-x86_64
module use /opt/intel/oneapi/modulefiles
module load mkl
module load mpi

srun /mnt/appshare/benchmarks/mp_linpack/runme_intel64_dynamic
date


98 changes: 98 additions & 0 deletions docs/source/InstallationGuides/Benchmarks/AutomatingOpenMPI.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
Installing pmix and updating slurm configuration for AMD processors
--------------------------------------------------------------------

This topic explains how to automatically update AMD servers for MPI jobs. To manually install pmix and update the slurm configuration, `click here. <OpenMPI_AOCC.html>`_

**Pre-requisites**

* ``provision.yml`` has been executed.
* An Omnia **slurm** cluster has been set up by ``omnia.yml`` running with at least 2 nodes: 1 manager and 1 compute.
* Verify that the target nodes are in the ``booted`` state. For more information, `click here <../InstallingProvisionTool/ViewingDB.html>`_.

**To run the playbook**::

cd benchmarks
ansible-playbook amd_benchmark.yml -i inventory

**To execute multi-node jobs**

* OpenMPI and aocc-compiler-*.tar should be installed and compiled with slurm on all cluster nodes or should be available on the NFS share.
.. note::
* Omnia currently supports ``pmix version2``, ``pmix_v2``.

* While compiling OpenMPI, include ``pmix``, ``slurm``, ``hwloc`` and, ``libevent`` as shown in the below sample command: ::

./configure --prefix=/home/omnia-share/openmpi-4.1.5 --enable-mpi1-compatibility --enable-orterun-prefix-by-default --with-slurm=/usr --with-pmix=/usr --with-libevent=/usr --with-hwloc=/usr --with-ucx CC=clang CXX=clang++ FC=flang 2>&1 | tee config.out



* For a job to run on multiple nodes (10.5.0.4 and 10.5.0.5) where OpenMPI is compiled and installed on the NFS share (``/home/omnia-share/openmpi/bin/mpirun``), the job can be initiated as below:

.. note:: Ensure ``amd-zen-hpl-2023_07_18`` is downloaded before running this command.

::

srun -N 2 --mpi=pmix_v2 -n 2 ./amd-zen-hpl-2023_07_18/xhpl


For a batch job using the same parameters, the script would be: ::


#!/bin/bash

#SBATCH --job-name=test

#SBATCH --output=test.log

#SBATCH --partition=normal

#SBATCH -N 3

#SBATCH --time=10:00

#SBATCH --ntasks=2




source /home/omnia-share/setenv_AOCC.sh

export PATH=$PATH:/home/omnia-share/openmpi/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/omnia-share/openmpi/lib

srun --mpi=pmix_v2 ./amd-zen-hpl-2023_07_18/xhpl


Alternatively, to use ``mpirun``, the script would be: ::

#!/bin/bash

#SBATCH --job-name=test

#SBATCH --output=test.log

#SBATCH --partition=normal

#SBATCH -N 3

#SBATCH --time=10:00

#SBATCH --ntasks=2




source /home/omnia-share/setenv_AOCC.sh

export PATH=$PATH:/home/omnia-share/openmpi/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/omnia-share/openmpi/lib

/home/omnia-share/openmpi/bin/mpirun --map-by ppr:1:node -np 2 --display-map --oversubscribe --mca orte_keep_fqdn_hostnames 1 ./xhpl



.. note:: The above scripts are samples that can be modified as required. Ensure that ``--mca orte_keep_fqdn_hostnames 1`` is included in the mpirun command in sbatch scripts. Omnia maintains all hostnames in FQDN format. Failing to include ``--mca orte_keep_fqdn_hostnames 1`` may cause job initiation to fail.

2 changes: 2 additions & 0 deletions docs/source/InstallationGuides/Benchmarks/OneAPI.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Install oneAPI for MPI jobs on Intel processors
________________________________________________

This topic explains how to manually install oneAPI for MPI jobs. To install oneAPI automatically, `click here. <AutomatingOneAPI.html>`_

**Pre-requisites**

* An Omnia **slurm** cluster running with at least 2 nodes: 1 manager and 1 compute.
Expand Down
35 changes: 32 additions & 3 deletions docs/source/InstallationGuides/Benchmarks/OpenMPI_AOCC.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Open MPI AOCC HPL benchmark for AMD processors
----------------------------------------------
This topic explains how to manually update servers for MPI jobs. To automatically install pmix and configure slurm, `click here. <AutomatingOpenMPI.html>`_

**Prerequisites**

Expand All @@ -26,7 +27,7 @@ Open MPI AOCC HPL benchmark for AMD processors

ii. Push the packages to the cluster nodes:

a. Update the ``package_list`` variable in the ``os_package_update/os_package_update.conf`` file and save it. ::
a. Update the ``package_list`` variable in the ``utils/os_package_update/package_update_config.yml`` file and save it. ::

package_list: "/install/post/otherpkgs/<os_version>/x86_64/custom_software/openmpi.pkglist"

Expand Down Expand Up @@ -62,7 +63,7 @@ Open MPI AOCC HPL benchmark for AMD processors
systemctl stop slurmctld.service
systemctl start slurmctld.service

4. Job execution can now be initiated. To initiate a job use the following sample commands.
4. Job execution can now be initiated.

For a job to run on multiple nodes (10.5.0.4 and 10.5.0.5) where OpenMPI is compiled and installed on the NFS share (``/home/omnia-share/openmpi/bin/mpirun``), the job can be initiated as below:
.. note:: Ensure ``amd-zen-hpl-2023_07_18`` is downloaded before running this command.
Expand Down Expand Up @@ -101,6 +102,34 @@ For a batch job using the same parameters, the script would be: ::
srun --mpi=pmix_v2 ./amd-zen-hpl-2023_07_18/xhpl


.. note:: If mpirun is used to initiate jobs, a host list is required as illustrated: ``mpirun -np 2 -host 10.5.0.4,10.5.0.5 ./amd-zen-hpl-2023_07_18/xhpl``
Alternatively, to use ``mpirun``, the script would be: ::

#!/bin/bash

#SBATCH --job-name=test

#SBATCH --output=test.log

#SBATCH --partition=normal

#SBATCH -N 3

#SBATCH --time=10:00

#SBATCH --ntasks=2




source /home/omnia-share/setenv_AOCC.sh

export PATH=$PATH:/home/omnia-share/openmpi/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/omnia-share/openmpi/lib

/home/omnia-share/openmpi/bin/mpirun --map-by ppr:1:node -np 2 --display-map --oversubscribe --mca orte_keep_fqdn_hostnames 1 ./xhpl



.. note:: The above scripts are samples that can be modified as required. Ensure that ``--mca orte_keep_fqdn_hostnames 1`` is included in the mpirun command in sbatch scripts. Omnia maintains all hostnames in FQDN format. Failing to include ``--mca orte_keep_fqdn_hostnames 1`` may cause job initiation to fail.

51 changes: 51 additions & 0 deletions docs/source/InstallationGuides/Benchmarks/hpcsoftwarestack.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Containerized HPC benchmark execution
--------------------------------------

Use this playbook to download docker images and pull images onto cluster nodes using `apptainer <https://apptainer.org/docs/user/main/index.html/>`_.

1. Ensure that the cluster has been `provisioned by the provision tool. <../../InstallationGuides/InstallingProvisionTool/index.html>`_ and the `cluster has been set up using omnia.yml. <../../InstallationGuides/BuildingClusters/index.html>`_

2. Enter the following variables in ``utils/hpc_apptainer_job_execution/hpc_apptainer_job_execution_config.yml``:

+-------------------------+-----------------------------------------------------------------------------------------------------------+
| Parameter | Details |
+=========================+===========================================================================================================+
| **hpc_apptainer_image** | * Docker image details to be downloaded in to cluster nodes using apptainer to create a sif file. |
| ``JSON list`` | |
| Required | * Example (for single image): :: |
| | |
| | |
| | hpc_apptainer_image: |
| | |
| | - { image_url: "docker.io/intel/oneapi-hpckit:latest" } |
| | |
| | * Example (for multiple images): :: |
| | |
| | hpc_apptainer_image: |
| | |
| | - { image_url: "docker.io/intel/oneapi-hpckit:latest" } |
| | |
| | - { image_url: "docker.io/tensorflow/tensorflow:latest" } |
| | |
| | * If provided, docker credentials in ``omnia_config.yml``, it will be used for downloading docker images. |
| | |
+-------------------------+-----------------------------------------------------------------------------------------------------------+
| **hpc_apptainer_path** | * Directory to filepath for storing apptainer sif files on cluster nodes. |
| | |
| ``string`` | * It is recommended to use a directory inside a shared path that is accessible to all cluster nodes. |
| | |
| Required | * **Default value:** ``"/home/omnia-share/softwares/apptainer"`` |
+-------------------------+-----------------------------------------------------------------------------------------------------------+

To run the playbook: ::

cd utils/hpc_apptainer_job_execution

ansible-playbook hpc_apptainer_job_execution.yml -i inventory

.. note:: Use the inventory file format specified under `Sample Files. <../../samplefiles.html>`_

HPC apptainer jobs can be initiated on a slurm cluster using the following sample command: ::

srun -N 3 --mpi=pmi2 --ntasks=4 apptainer run /home/omnia-share/softwares/apptainer/oneapi-hpckit_latest.sif hostname

3 changes: 3 additions & 0 deletions docs/source/InstallationGuides/Benchmarks/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@ Running HPC benchmarks on omnia clusters

.. toctree::
OneAPI
AutomatingOneAPI
OpenMPI_AOCC
AutomatingOpenMPI
hpcsoftwarestack
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The ``omnia.yml`` playbook installs Slurm, BeeFGS Client, NFS Client in addition
kinit admin (When prompted, provide kerberos_admin_password as entered in security_config.yml)
ipa user-add --homedir=<nfs_dir_path> --password

For example: ``ipa user-add FirstName_LastName --first=FirstName --last=LastName --password --homedir=/home/omnia-share/FirstName_LastName``
For example: ``ipa user-add FirstName_LastName --first=FirstName --last=LastName --password --homedir=/home/omnia-share/FirstName_LastName --shell /bin/bash``

After the new user account logs in for the first time, you will be prompted to change the password to the account.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,6 @@ After successfully running ``provision.yml``, go to `Building Clusters <../Build

ssh-keygen -R <node IP>

* If a second run of ``provision.yml`` fails, the ``input/provision_config.yml`` file will be unencrypted.
* If a subsequent run of ``provision.yml`` fails, the ``input/provision_config.yml`` file will be unencrypted.

To create a node inventory in ``/opt/omnia``, `click here <../PostProvisionScript.html>`_.
2 changes: 1 addition & 1 deletion docs/source/InstallationGuides/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The control plane needs to be internet-capable with Github and a full OS install

Once the Omnia repository has been cloned on to the control plane: ::

git clone https://github.com/dellhpc/omnia.git
git clone https://github.com/dell/omnia.git

Change directory to Omnia using: ::

Expand Down
Loading

0 comments on commit ff02096

Please sign in to comment.