Skip to content

Latest commit

 

History

History
175 lines (139 loc) · 7.52 KB

README.org

File metadata and controls

175 lines (139 loc) · 7.52 KB

Hydra

This repository contains the configuration/setup/notes for my raspberry pi cluster. I’ve used the following tutorial to set it up, but will provide some notes here in case of link-rot.

Setup host

The goal is that all the setup should be done by Ansible as far as possible. To set up Ansible on my machine, I’ve taken the following steps:

UBUNTU_CODENAME=jammy
wget -O- \
     "https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x6125E2A8C77F2818FB7BD15B93C4A3FD7BB9C367" | \
     sudo gpg --dearmour -o /usr/share/keyrings/ansible-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/ansible-archive-keyring.gpg] http://ppa.launchpad.net/ansible/ansible/ubuntu ${UBUNTU_CODENAME} main" | \
    sudo tee /etc/apt/sources.list.d/ansible.list
sudo apt update && sudo apt install ansible

We’ll also need a public SSH-key which we can get like the example below. The public key will in this case be the contents of $HOME/.ssh/id_ed25519.pub.

ssh-keygen -t ed25519

Hardware

I’m setting up two RPI4’s with four gigabytes of memory and two with eight. They’re all powered by a TL-SG105PE V2 PoE switch, and the PIs are housed in a cluster-case from UCTRONICS. The head node also has the fan from the case connected via a USB port. I’m using a Samsung 870 EVO SSD-disk 500 GB disk for storage.

Plexgear

I tried using a USB 3.0 to Gigabit Ethernet adapter from plexgear. However, due to troubles with the device switching usb-mode when under load, I had to reconsider and used a different adapter.

Head Node

The head node is configured according to the tutorial. I flashed the Micro-SD card using the rpi-imager program, with Raspberry Pi OS Lite (64-bit). I added the options for only public key authentication for SSH and my public SSH key. I also set the hostname to be hydra.

I didn’t configure the wifi connection, as I figured we would connect to it using the USB ethernet connection anyway.

Now we can connect the raspberry to one of the ports of the switch to power it up, and it should be available on the LAN. We can then go ahead and run the Ansible playbook for the head node:

cd playbooks
ansible-playbook -i inventory.ini --limit head-node setup-head-node.yaml

Currently, the head node uses a template when it’s creating its DHCP config. This means that if this play is run after having initialized the compute nodes, we will have to enroll them again since they will be modifying the same file(s).

Modifying the DHCP config

Until I find a better solution, this will remain a todo item. Currently I just found the MAC address of the switch by running dhcp-lease-list on the head node. I then modified the DHCP config to now include the switch MAC and give it a static IP.

Shared Disk

This disk is used to host the partitions for the rest of the compute nodes. With me ssh’ed- and the harddisk plugged into the head node, we can run the following commands. This could be done from any computer though, so that’s why it won’t be in the ansible playbook.

sudo parted -s /dev/sda mklabel gpt
sudo parted --a optimal /dev/sda mkpart primary ext4 0% 100%
sudo mkfs -t ext4 /dev/sda1

We then need to mount the disk. This is in the ansible play for the head node, but we can make sure manually as well:

sudo mkdir /mnt/usb
sudo mount /dev/sda1 /mnt/usb
sudo systemctl daemon-reload

Compute Nodes

We want the compute nodes to be able to network boot from our head node. For the Raspberry Pi 4, we have to boot a single time from an SD card to configure the boot order using raspi-config.

I flashed an SD card with Raspberry PI OS Lite 64-bit again, just changing out the password for the user account, and setting the hostname to compute-01.

To be able to SSH to the compute node from the head node, I also had to add the following to my ~/.ssh/config file:

Host hydra.local
     ForwardAgent yes

Another way to accomplish this is with the following command, which will add the compute node to your known_hosts file:

ssh -o ProxyCommand="ssh -W %h:%p -q [email protected]" compute-01.local

Now, SSH-ing via the head node, we grab the IP of the compute node using dhcp-lease-list. Then, we SSH to this IP, and run sudo raspi-config. For the configuration, we choose alternatives 6 Advanced Options > A4 Boot Order > B3 Network Boot. This will now configure the EEPROM bootloader on the next reboot.

After the node have rebooted, we check the configuration:

$ vcgencmd bootloader_config
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0

[all]
BOOT_ORDER=0xf21

after this, we can run the compute-node-info play. I had to add the certificate to my known_hosts file, so it isn’t entirely unattended. (Unless you used the SSH proxy command instead of forwarding the agent). Then, I grabbed the output from that play, and stored in a scratch buffer or somewhere temporary.

Using the same, unmodified SD card we do this for the rest of the compute nodes, just modifying the hostname of the output for each. We’ll use this to build our inventory later.

Running Ansible roles on the compute nodes

By adding a “bastion host” or “SSH proxy” to the SSH configuration, we can access hostnames on the subnet created by the DHCP server on the head node. We can accomplish this by adding the following to the ~/.ssh/config file:

Host 192.168.50.*
     ProxyJump hydra.local

Where 192.168.50.* is the network portion of the subnet managed by the head node, and hydra.local is the hostname of the head node on your local network.

Important note

This setup will not work for configurations that has ten or more compute nodes. This is because of the way the IP addresses are configured in the DHCP server configuration and in the hosts file for the nodes.

Define interaciton surface

I need to clean this up and create a good interaction surface for this repository. Some of the offending paths are:

  • Temporary workspace: /tmp/image
  • Cluster network, per now using cluster_network_portion, but then hard-coding the resulting host portion of the IP addresses. (And mask)

Migrate DHCP server

ISC DHCP is deprecated, and should be migrated to Kea DHCP server. There is a migration tool for translating the config file, and a documentation page for further migration resources.

Rewrite to use compute node hosts

Currently a variable list of objects called compute_nodes is used to keep track of the MAC address and serial number for the compute nodes. However, it would be better to just define the compute nodes under a heading that can be accessed by the hosts key in ansible playbooks. Now, the compute nodes will have to be defined both in the variable list and as separate hosts.