diff --git a/xml/MAIN.hpc-guide.xml b/xml/MAIN.hpc-guide.xml index d91debc..0377a82 100644 --- a/xml/MAIN.hpc-guide.xml +++ b/xml/MAIN.hpc-guide.xml @@ -45,9 +45,7 @@ - + diff --git a/xml/product-entities.ent b/xml/product-entities.ent index 488af64..18580f7 100644 --- a/xml/product-entities.ent +++ b/xml/product-entities.ent @@ -33,6 +33,7 @@ + @@ -42,3 +43,4 @@ monitor # "> sql > "> DBnode # "> +[hpcnode&productnumbershort;] &warewulf;> "> diff --git a/xml/remote_administration.xml b/xml/remote_administration.xml index f6946d6..9be26f5 100644 --- a/xml/remote_administration.xml +++ b/xml/remote_administration.xml @@ -26,6 +26,7 @@ yes + Genders — static cluster configuration database diff --git a/xml/warewulf.xml b/xml/warewulf.xml new file mode 100644 index 0000000..4539e2e --- /dev/null +++ b/xml/warewulf.xml @@ -0,0 +1,501 @@ + + + + %entities; +]> + + + Deploying compute nodes + + + + &hpc; clusters consist of one or more sets of identical compute nodes. In large clusters, + each set could contain thousands of machines. To help deploy so many compute nodes as + clusters scale up, the &hpcm; provides the deployment tool &warewulf;. + + + + + yes + + + + + About &warewulf; + + &warewulf; is a deployment system for compute nodes in &hpc; clusters. Compute nodes + are booted and deployed over the network with a kernel and node image provided by &warewulf;. + To generate the node image, &warewulf; uses a &warewulf; container, + which is a base operating system container with a kernel and an init + implementation installed. &warewulf; configures images for the individual compute nodes using + node profiles and &warewulf; overlays. + + + Node profiles are used to apply the same configuration + to multiple nodes. Node profiles can include settings such as the container + to use, overlays to apply, and IPMI details. New nodes automatically use the + default node profile. You can also create additional node profiles, + for example, if two groups of nodes require different containers. + + + &warewulf; overlays are compiled for each individual compute node: + + + + + System (or wwinit) overlays are applied to nodes at boot time + by the wwinit process, before &systemd; starts. These overlays + are required to start the compute nodes, and contain basic node-specific configuration + to start the first network interface. System overlays are not updated during runtime. + + + + + Runtime (or generic) overlays are updated periodically at runtime by the + wwclient service. The default is once per minute. These overlays + are used to apply configuration changes to the nodes. + + + + + The Host overlay is used for configuration that applies to the &warewulf; server + itself, such as adding entries to /etc/hosts or setting up the + DHCP service and NFS exports. + + + + + System and runtime overlays can be overlayed on top of each other. For example, instead of + altering a configuration setting in an overlay, you can override it with a new overlay. + You can set a list of system and runtime overlays to apply to individual nodes, or to + multiple nodes via profiles. + + + + + Deploying compute nodes with &warewulf; + + Requirements + + + The &warewulf; server has a static IP address. + + + + + The compute nodes are set to PXE boot. + + + + + The &warewulf; server is accessible from an external network, but is connected to + the compute nodes via an internal cluster network used for deployment. This is + important because &warewulf; configures DHCP and TFTP on the &warewulf; server, + which might conflict with DHCP on the external network. + + + + + + Deploying compute nodes with &warewulf; + + + On the &warewulf; server, install &warewulf;: + +&prompt.user;sudo zypper install warewulf4 + + + + The installation creates a basic configuration for the &warewulf; server in the file + /etc/warewulf/warewulf.conf. Review this file to make sure + the details are correct. In particular, check the following settings: + +ipaddr: &subnetI;.250 +netmask: &subnetmask; +network: &subnetI;.0 + + ipaddr is the IP address of the &warewulf; server on the internal + cluster network to be used for node deployment. netmask and + network must match this network. + + + Additionally, check that the DHCP range is in the cluster network: + +dhcp: + range start: &subnetI;.21 + range end: &subnetI;.50 + + + + In the file /etc/sysconfig/dhcpd, check that + has the correct value. This must be + the interface on which the cluster network is running. + + + + + Start and enable the &warewulf; service: + +&prompt.user;sudo systemctl enable --now warewulfd + + + + Configure the services required by &warewulf;: + +&prompt.user;sudo wwctl configure --all + + This command performs the following tasks: + + + + + Configures DHCP and enables the DHCP service. + + + + + Writes the required PXE files to the TFTP root directory and enables the TFTP service. + + + + + Updates the /etc/hosts file. + + + + + Configures an NFS server on the &warewulf; server and enables the NFS service. + + + + + Creates host keys and user keys for passwordless SSH access to the nodes. + + + + + + + When the configuration is finished, log out of the &warewulf; server and back into it. + This creates an SSH key pair to allow passwordless login to the deployed compute nodes. + If you require a password to secure the private key, set it now: + +&prompt.user;ssh-keygen -p -f $HOME/.ssh/cluster + + + + Importing the &warewulf; container from the &suse; registry requires &scc; credentials. + Set your SCC credentials as environment variables before you import the container: + +&prompt.user;export WAREWULF_OCI_USERNAME=USER@EXAMPLE.COM +&prompt.user;export WAREWULF_OCI_PASSWORD=REGISTRATION_CODE + + + + Import the &warewulf; container from the &suse; registry: + +&prompt.user;sudo wwctl container import \ +docker://registry.suse.com/suse/hpc/warewulf4-x86_64/sle-hpc-node:&product-ga;.&product-sp; \ +hpcnode&product-ga;.&product-sp; --setdefault + + The argument sets this as the default container + in the default node profile. + + + + + Configure the networking details for the default profile: + +&prompt.user;sudo wwctl profile set -y default --netname default \ +--netmask &subnetmask; --gateway &subnetI;.250 + + To see the details of this profile, run the following command: + +&prompt.user;sudo wwctl profile list -a default + + + + Add compute nodes to &warewulf;. For example, to add ten discoverable nodes + with preconfigured IP addresses, run the following command: + +&prompt.user;sudo wwctl node add node[01-10] \ +--netdev eth0 -I &subnetI;.100 \ +--discoverable=true + + + + One or more node names. Node names must be unique. If you have node groups + or multiple clusters, add descriptors to the node names, for example + node01.cluster01. + + + + + The IP address for the first node. Subsequent nodes are given incremental + IP addresses. + + + + + Allows &warewulf; to assign a MAC address to the nodes when they boot for + the first time. + + + + + To view the settings for these nodes, run the following command: + +&prompt.user;sudo wwctl node list -a node[01-10] + + + + Add the nodes to the /etc/hosts file: + +&prompt.user;sudo wwctl configure hostfile + + + + Rebuild the container image to make sure it is ready to use: + +&prompt.user;sudo wwctl container build hpcnode&product-ga;.&product-sp; + + + + Build the default system and runtime overlays: + +&prompt.user;sudo wwctl overlay build + + This command compiles overlays for all the nodes. + + + + + You can now boot the compute nodes with PXE. &warewulf; provides all the required information. + + + + + Advanced &warewulf; tasks + + Using &warewulf; with &uefisecboot; + + To boot compute nodes with &uefisecboot; enabled, the packages shim + and grub2-x86_64-efi must be installed in the &warewulf; container. + For the container you imported in , this should + already be the default. Use the following procedure to verify that the packages are installed: + + + Verifying packages in a &warewulf; container + + + Open a shell in the &warewulf; container: + +&prompt.user;sudo wwctl container shell hpcnode&productnumbershort; + + + + Search for the packages and check their installation status in the + S column: + +&prompt.wwcon;zypper search shim grub2 + + If shim and grub2-x86_64-efi are not installed, + install them now: + +&prompt.wwcon;zypper install shim grub2-x86_64-efi + + + + Exit the container's shell: + +&prompt.wwcon;exit + + If any changes were made, &warewulf; automatically rebuilds the container. + + + + + We recommend rebuilding the container again manually to make sure the changes are applied: + +&prompt.user;sudo wwctl container build hpcnode&productnumbershort; + + + + By default, &warewulf; boots nodes via iPXE, which cannot be used when &uefisecboot; + is enabled. Use the following procedure to switch to &grub; as the boot method: + + + Configuring &warewulf; to boot via &grub; + + + Open the file /etc/warewulf/warewulf.conf and change the value of + grubboot to true: + +warewulf: + [...] + grubboot: true + + + + Reconfigure DHCP and TFTP to recognize the configuration change: + +&prompt.user;sudo wwctl configure dhcp +&prompt.user;sudo wwctl configure tftp + + + + Rebuild the system and runtime overlays: + +&prompt.user;sudo wwctl overlay build + + + + + Configuring local node storage + + Nodes provisioned by &warewulf; are ephemeral, so local disk storage is not required. + However, local storage can still be useful, for example, as scratch storage for + computational tasks. + + + &warewulf; can set up and manage local storage for compute nodes via the disk provisioning + tool &ignition;. Before booting the compute nodes, you must install &ignition; in the + &warewulf; container and add the disk details to either a node profile or individual nodes. + A node or profile can have multiple disks. + + + Use the following procedure to install &ignition; in the &warewulf; container: + + + Preparing a &warewulf; container for local storage + + + Open a shell in the &warewulf; container: + +&prompt.user;sudo wwctl container shell hpcnode&productnumbershort; + + + + Install the ignition and gptfdisk packages: + +&prompt.wwcon;zypper install ignition gptfdisk + + + + Exit the container's shell: + +&prompt.wwcon;exit + + &warewulf; automatically rebuilds the container. + + + + + We recommend rebuilding the container again manually to make sure the changes are applied: + +&prompt.user;sudo wwctl container build hpcnode&productnumbershort; + + + + The following examples demonstrate how to add a disk to a compute node's configuration file. + To set up the disk, &ignition; requires details about the physical storage device, + the partitions on the disk, and the file system to use. + + + To add disks to a profile instead of an individual node, use the same commands but + replace wwctl node set NODENAME with + wwctl profile set PROFILENAME. + + + Adding disk configuration to a node: scratch partition +&prompt.user;sudo wwctl node set node01 \ +--diskname /dev/vda --diskwipe \ +--partname scratch --partcreate \ +--fsname scratch --fsformat btrfs --fspath /scratch --fswipe + + This is the last partition, so does not require a partition size or number; + it will be extended to the maximum possible size. + + + + + Adding disk configuration to a node: swap partition +&prompt.user;sudo wwctl node set node01 \ +--diskname /dev/vda \ +--partname swap --partsize=1024 --partnumber 1 \ +--fsname swap --fsformat swap --fspath swap + + Set a partsize and partnumber for all partitions + except the last one (scratch). + + + + + + The path to the physical storage device. + + + + + The name of the partition. This is used as the partition label, for example, in + /dev/disk/by-partlabel/PARTNAME. + + + + + The path to the partition that will contain the file system, using the + /dev/disk/by-partlabel/ format. + + + + + The type of file system to use. &ignition; fails if no type is defined. + + + + + The absolute path for the mount point. This is mandatory if you intend to mount + the file system. + + + + + For more information about the available options, run wwctl node set --help. + + + + + + For more information + + + + &warewulf;: + + + + + Node profiles: + + + + + &warewulf; overlays: + + + + + The node provisioning process: + + + + + +