Skip to content

Commit

Permalink
Add DOCA install playbook
Browse files Browse the repository at this point in the history
  • Loading branch information
Kayobe Automation authored and assumptionsandg committed Jan 21, 2025
1 parent e26298b commit 9650481
Show file tree
Hide file tree
Showing 4 changed files with 69 additions and 26 deletions.
61 changes: 38 additions & 23 deletions doc/source/contributor/ofed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,17 @@ OFED

Warning: Experimental workflow subject to change

This section documents the workflow for building OFED packages for Release train integration.

The workflow builds the OFED kernel modules against the latest available kernel in Release train
(as configured in SKC) and compiles them into RPM packages to be uploaded to Ark. Addtionally,
this workflow downloads the userspace OFED packages from the Nvidia repository and uploads these
to Ark.
The Nvidia DOCA framework is distributed as part of StackHPC Release Train for OFED driver support,
this repository is synced into Ark as part of the Release Train worfkflows, however to ensure
compatibility with Release Train packages, we are required to build OFED modules with support for
the latest Release Train kernel.

Workflow
========

The workflow uses workflow_dispatch to manually request an OFED build, which will deploy a builder
VM, apply kayobe config to the builder, upgrade the kernel, reboot, then run two Ansible playbooks
for building and uploading OFED to Ark.
for building and uploading OFED modules to Ark.

Pre-requisites
--------------
Expand All @@ -25,31 +23,48 @@ Before building OFED packages, the workflow will ensure that:

* A full distro-sync has taken place, ensuring the kernel is upgraded.

* The bootloader has been configured to use the latest kernel
* The bootloader has been configured to use the latest kernel (reset-bls-entries.yml)

* noexec is disabled in the temporary logical volume.

build-ofed
----------

Currently we only support building Rocky Linux 9 OFED packages.

In order to setup OFED, we're required to build kernel modules for the OFED drivers as
the kernels we provide in release train are unsupported by OFED. To accomplish this we
will need to use the doca-kernel-support from the doca-extra repository.
Currently we only support building Rocky Linux 9 OFED kerenl module packages.

We will need to instll dependencies in order to build the OFED kernel modules, and these
are installed at the beginning of the build playbook. We also install base and appstream
dependencies of userspace OFED packages here, this is intended to stop these dependencies
being pulled in later when we download the OFED packages from the doca-host repository.
The Build OFED module workflow will check that the filesystem is configured (noexec disabled)
to allow the DOCA build script to run. The workflow will also install any necessary dependencies
for the module build.

At the end of the playbook following the kernel module build, the OFED userspace packages
are downloaded from the upstream repository in order to upload these to Ark.
The build script will output a ``doca-kernel-repo`` RPM which contains all kernel modules built
as part of the workflow. When this RPM is installed, the repofile is created pointing to the
modules in `/usr/share/doca-host-<doca-version>/Modules/<kernel-version>/` on the host.

push-ofed
---------

As we're not syncing OFED from any upstream source, and are instead creating our own
repository of custom packages, we will be required to setup the Pulp distribution/publication
and upload the content directly to Ark. This playbook uses the Pulp CLI to upload the RPMs
to Ark.
As mentioned above, the DOCA repository is synced into the `doca` repository in Ark. This workflow
will upload the ``doca-kernel-repo`` RPM to a seperate repository named `doca-modules`. The version
for this repository is set in `pulp-repo-versions.yml` and is disabled for local pulp syncs by
default.

Install process
===============

Pre-requisites
--------------

* Ensure the OFED hosts are upgraded with the latest packages in the point release.

* The bootloader has been configured to use the latest kernel (reset-bls-entries.yml)

install-doca
------------

A playbook is provided to install DOCA on hosts in the `mlnx` group. Ensure this group
is configured to include the hosts you wish to install DOCA on. To run the install
playbook:

.. code-block:: console
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/install-doca.yml
28 changes: 28 additions & 0 deletions etc/kayobe/ansible/install-doca.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
- name: Install DOCA
become: true
hosts: mlnx
gather_facts: true
tasks:
- name: Get running kernel
ansible.builtin.command:
cmd: "uname -r"
register: kernel

- name: Install kernel repo
ansible.builtin.dnf:
name: doca-kernel-repo
state: latest
update_cache: true

- name: Ensure correct priority for DOCA modules
ansible.builtin.lineinfile:
line: "priority=-2"
insertafter: EOF
path: "/etc/yum.repos.d/doca-kernel-{{ kernel.stdout }}.repo"

- name: Install DOCA OFED
ansible.builtin.dnf:
name: doca-ofed
state: latest
update_cache: true
4 changes: 2 additions & 2 deletions etc/kayobe/dnf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ dnf_custom_repos_doca:
password: "{{ stackhpc_repo_mirror_password | default(omit, true) }}"
doca-modules:
baseurl: "{{ stackhpc_repo_rhel9_doca_modules_url }}"
description: "OFED Kernel modules for DOCA {{ stackhpc_pulp_doca_version }} - RHEL $releasever"
description: "OFED Kernel module repository for DOCA {{ stackhpc_pulp_doca_version }} - RHEL $releasever"
enabled: "{{ dnf_enable_doca_modules | bool | default(false) }}"
priority: -2
priority: -1
file: doca
gpgcheck: no
username: "{{ stackhpc_repo_mirror_username | default(omit, true) }}"
Expand Down
2 changes: 1 addition & 1 deletion etc/kayobe/pulp-repo-versions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,4 @@ stackhpc_pulp_repo_ubuntu_jammy_version: 20240924T064114
stackhpc_pulp_repo_rhel_9_4_doca_version: 20241211T153620
stackhpc_pulp_repo_rhel_9_4_doca_modules_version: 20241213T112245
stackhpc_pulp_repo_rhel_9_5_doca_version: 20241211T171301
stackhpc_pulp_repo_rhel_9_5_doca_modules_version: 20241213T112245
stackhpc_pulp_repo_rhel_9_5_doca_modules_version: 20250115T150314

0 comments on commit 9650481

Please sign in to comment.