Walk-through for installing kubernetes dockerless

This should work for the Ubuntu Server LTS edition 22+ and was tested on free-tier Orcale Cloud VMs, inspired by https://matrix.org/docs/guides/free-small-matrix-server

Cleaning up the initial system

Remove some unwanted, unused packages. Possibly, some packages might not be installed, just in case.

sudo apt-get -y purge netfilter-persistent iptables-persistent
sudo snap remove oracle-cloud-agent
sudo apt-get -y purge snap snapd open-iscsi lxd

Ubuntu minimal

If the Ubuntu minimal image was used, run unminimize to get manpages back.

Install utility packages

sudo apt-get update
sudo apt-get install -y vim less screen git bridge-utils net-tools inetutils-ping psmisc software-properties-common jq

Update installed packages

sudo apt-get dist-upgrade -y
sudo apt-get --purge autoremove -y
sudo apt-get clean

Configure unattended-upgrades

To get system package updates automatically/unattended.

sudo sh -c "echo 'Unattended-Upgrade::Origins-Pattern { \"origin=*\"; };' >> /etc/apt/apt.conf.d/50unattended-upgrades"

Install & configure fail2ban

For protecting your SSH server from brute forcing system passwords. See also: https://linuxize.com/post/install-configure-fail2ban-on-ubuntu-20-04/

sudo apt-get install fail2ban
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local

Adjust jail.local your needs, especially the [sshd] section.

Configure networking and packet forwarding

SYSCTLCFG=/etc/sysctl.conf
sudo sh -c "grep -v bridge-nf-call-iptables $SYSCTLCFG | echo 'net.bridge.bridge-nf-call-iptables = 1' >> $SYSCTLCFG"
sudo sh -c "echo net.ipv4.ip_forward=1 >> $SYSCTLCFG"
sudo sysctl --system
sudo modprobe overlay
sudo modprobe br_netfilter
sudo sh -c 'echo overlay >> /etc/modules'
sudo sh -c 'echo br_netfilter >> /etc/modules'

Wireguard VPN for encrypted networking between nodes

sudo apt install -y wireguard
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey
sudo chmod 600 /etc/wireguard/privatekey

TODO: A script for adding new client in server cfg and generate client cfg.

Server Config (`wg0.conf`) on Node 1

[Interface]
Address = 10.0.9.1/24
ListenPort = 51820
PrivateKey = ***

# Node 2
[Peer]
PublicKey = ***
AllowedIPs = 10.0.9.2/32, <local node2 IP>/32

# Node 3
[Peer]
PublicKey = ***
AllowedIPs = 10.0.9.3/32, <local node3 IP>/32

Client Config (`wg0.conf`) example on Node 2

[Interface]
PrivateKey = ***
Address = 10.0.9.2/24

[Peer]
PublicKey = ***
Endpoint = <public IP of node1>:51820
AllowedIPs = 10.0.9.0/24, <local node2 IP>/32, <local node3 IP>/32

How to find your public ip address oneliner

IPADDRESS=$(curl -s http://checkip.dyndns.org | python3 -c 'import sys; data=sys.stdin.readline(); import xml.etree.ElementTree as ET; print(ET.fromstring(data).find("body").text.split(":")[-1].strip())')

Update Wireguard config

sudo chmod 600 /etc/wireguard/wg0.conf
sudo sh -c 'wg-quick down wg0; wg-quick up wg0'

Bring up the WireGuard interface at boot time

sudo systemctl enable wg-quick@wg0

Attention:

Make sure the wireguard port used here (51820) is accessible from the outside on each node, check firewall settings if any cloud provider is used.

For wireguard debugging, see https://serverfault.com/a/1020299

Install kubernetes tools

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt-get install -y kubectl kubeadm kubelet kubernetes-cni

Install CRI-O

Add software sources for CRI-O and buildah first:

Configuring CRI-O

TMPFN=$(mktemp)
cat << EOF > $TMPFN
[crio.runtime]
conmon = "$(which conmon)"
EOF
sudo mv $TMPFN /etc/crio/crio.conf.d/99-custom.conf

Add existing private registry name to /etc/hosts [optional]

grep -q registry /etc/hosts || sudo sh -c "echo '10.0.9.1 registry' >> /etc/hosts"

Use docker.io public registry only, avoid questions for unqualified image names

sudo sed -i -e '/unqualified-search-registries/cunqualified-search-registries = ["docker.io",]' /etc/containers/registries.conf

Start CRI-O

sudo systemctl daemon-reload && sudo systemctl enable crio && sudo systemctl start crio && sudo systemctl status crio

Init kubernetes

Some preparations

sudo kubeadm config images pull
swapoff /swap # if any, just in case

Literature for k8s networking, see: https://kubernetes.io/docs/concepts/cluster-administration/networking/

Init the master node

Set up k8s init config:

let it use the cri-o socket
different IP (that of the VPN) for the API server
set the pod network range (pod-network-cidr, the network of the pods which must not overlap with any host networks/devices), has to be given for the CNI plugin to pick it up (here: flannel)
generate a new token, which is not done automatically once a cfg file is provided

KUBELET_CFG=/etc/kubernetes/custom_kubelet.conf
(kubeadm config print init-defaults --component-configs=KubeletConfiguration \
  | sed -e '/ClusterConfiguration/{:a;n;/networking:/{a\  podSubnet: 10.244.0.0/16' -e'};ba' -e '}' \
  | sed -e '/bootstrapTokens/{:a;n;/\(token:\s*\)/d;ba' -e '}' \
  | sed -e '/nodeRegistration:/{:a;n;/\(name:\s*\)/d;ba' -e '}' \
  | sed -e "/controllerManager:/acontrolPlaneEndpoint: $(hostname -f)" \
  | sed -e '/localAPIEndpoint:/,/bindPort:/d' \
  | sed -e '/cgroupDriver:/aserverTLSBootstrap: true' \
  | sed -e 's#\(criSocket:\s*\).*$#\1unix:///var/run/crio/crio.sock#'\
  | sed -e '/eviction/aevictionHard:\n  memory.available:  "200Mi"\n  nodefs.available:  "2Gi"\n  nodefs.inodesFree: "20000"\n  imagefs.available: "2Gi"'\
  | sed -e 's#^\(containerRuntimeEndpoint:\).*$#\1 "unix:///var/run/crio/crio.sock"#'\
) > "$KUBELET_CFG"
sudo kubeadm init --ignore-preflight-errors=Mem,Swap --config="$KUBELET_CFG"

For problems with missing pod network, see: flannel-io/flannel#728 (comment)

Setup the kubernetes config on a client

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.9.1:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash>

To let the master node run pods as well

And to let the coredns deployment start on the master node.

kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Note: Documentation on taints and tolerations

Scale down the deployment for coredns

kubectl scale deployments.apps -n kube-system coredns --replicas=1

Setup cluster networking with flannel CNI

See https://github.com/flannel-io/flannel

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

or with Helm:

helm repo add flannel https://flannel-io.github.io/flannel/
helm install flannel --set podCidr="10.244.0.0/16" --namespace kube-flannel flannel/flannel

Upgrading flannel later

See https://github.com/flannel-io/flannel/blob/master/Documentation/upgrade.md

Troubleshoot missing `/run/flannel/subnet.env`

In case kube-system pods fail to start with the error:

loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory

Create that file with the correct subnet:

echo -e "FLANNEL_NETWORK=10.244.0.0/16\nFLANNEL_SUBNET=10.244.0.1/24\nFLANNEL_MTU=1450\nFLANNEL_IPMASQ=true" | sudo tee /run/flannel/subnet.env

(Found here: https://stackoverflow.com/a/75395978)

Troubleshoot unreliable connections from localhost failing >90% of the time

This is a problem for a local registry server in the same cluster. Testing with curl this error looks like this most of the time:

$ curl -sSl "https://$baseurl/v2/_catalog"
curl: (7) Failed to connect to img.**.ddnss.de port 443 after 2 ms: Couldn't connect to server

There are no errors logged by ingress and it looks more like caused by the underlying networking, which is flannel here. By looking at the iptables rules set up by kubernetes, it can be seen that some rules by flannel use the node name.

Chain FLANNEL-FWD (1 references)
target     prot opt source               destination
ACCEPT     all  --  ${HOSTNAME}/16       anywhere             /* flanneld forward */
ACCEPT     all  --  anywhere             ${HOSTNAME}/16       /* flanneld forward */

Assuming this config has a cluster on the open net in mind, presumably, this is supposed to resolve to the external IP address. Thus, having an entry in /etc/hosts pointing the node name to 127.0.1.1 (which is good practice for setting the machine hostname) render these flannel rules problematic.

However, removing the entry from /etc/hosts instantaneously resolves the connection failures.

Check network settings (FYI)

Get service-cluster-cidr:
(https://stackoverflow.com/a/61685899)

echo '{"apiVersion":"v1","kind":"Service","metadata":{"name":"tst"},"spec":{"clusterIP":"1.1.1.1","ports":[{"port":443}]}}' | kubectl apply -f - 2>&1 | sed 's/.*valid IPs is //'

Get Services IPs range:

kubectl cluster-info dump | grep -m 1 service-cluster-ip-range

Get Pods IPs range:

kubectl cluster-info dump | grep -m 1 cluster-cidr

Literature on network config ports needed (FYI)

https://stackoverflow.com/questions/39293441/needed-ports-for-kubernetes-cluster
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#check-required-ports
https://coreos.com/flannel/docs/latest/flannel-config.html#firewall
https://serverfault.com/questions/1040893/vpn-network-and-kubernetes-clusters

Troubleshooting

NetworkPlugin cni failed to set up pod “xxxxx” network: failed to set bridge addr: “cni0” already has an IP address different from 10.x.x.x - Error

failed to get cgroup stats for "/system.slice/kubelet.service"

Adding a file in /etc/systemd/system/kubelet.service.d/11-cgroups.conf with:

[Service]
CPUAccounting=true
MemoryAccounting=true

activate by calling:

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Metrics, for kubectl top

Install the metrics server

wget -O metrics.yaml https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl apply -f metrics.yaml

Add machine hostname to coredns

Add the hostname to coredns name resolution for the metrics server to reolv it properly:

kubectl edit cm coredns -n kube-system

Add below the health section and before the ready keyword a hosts entry (not ideal, but works):

hosts {
    <outside ip address> <hostname>.cluster.local
    fallthrough
}

Reread the config file:

kubectl rollout restart -n kube-system deployment/coredns

Let the cluster issue its own certs

From here: kubernetes-sigs/metrics-server#196 (comment)

Append serverTLSBootstrap: true in the kubelet: section of the kubelet-config ConfigMap and save:

kubectl -n kube-system edit configmap kubelet-config

On each node, add serverTLSBootstrap to the kubelet config (as root typically):

echo 'serverTLSBootstrap: true' >> /var/lib/kubelet/config.yaml
sudo systemctl restart kubelet

Approve all signing requests (CSR) programmatically:

for kubeletcsr in `kubectl -n kube-system get csr \
    | grep kubernetes.io/kubelet-serving \
    | awk '{ print $1 }'`;
do
    kubectl certificate approve $kubeletcsr;
done

The metrics server should work now:

kubectl top pods --all-namespaces

Get helm

curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get-helm-3 | bash
helm repo add k8s-at-home https://k8s-at-home.com/charts/
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

NVIDIA GPU support

Install the package `nvidia-container-runtime`

As described here: NVIDIA/nvidia-docker#1427 (comment) NVIDIAs container runtime needs to be installed but without using docker, CRI-O will be configured accordingly below: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian

# sources for NVIDIA container toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-container-runtime

Set cri-o hooks appropriately

sudo mkdir -p /usr/share/containers/oci/hooks.d
sudo bash -c '
cat > /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json << EOF
{
    "version": "1.0.0",
    "hook": {
        "path": "/usr/bin/nvidia-container-toolkit",
        "args": ["nvidia-container-toolkit", "prestart"]
    },
    "when": {
        "always": true,
        "commands": [".*"]
    },
    "stages": ["prestart"]
}
EOF
'

Install the device plugin

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml

Test GPU support with an example pod

Create a new file pod1.yaml with the following contents:

apiVersion: v1
kind: Pod
metadata:
  name: pod1
spec:
  restartPolicy: OnFailure
  containers:
  - image: nvcr.io/nvidia/cuda:11.0-base
    name: pod1-ctr
    command: ["sleep"]
    args: ["100000"]

    resources:
      limits:
        nvidia.com/gpu: 1

Create the pod by calling:

kubectl create -f test.yaml

Check GPU support by running nvidia-smi in the pod:

kubectl exec -it pod1 -- nvidia-smi

A public name with certificates

Using https://www.ddnss.de

Create an account
Create a new domain, wildcard capable
Put the update script into /etc/dhcp/dhclient-exit-hooks.d/99_ddnss_update
(from https://www.ddnss.de/info.php -> 'CRON & Bash Script')
https://gist.github.com/ibressler/53ea52c88392831b615d65091281dc38
Fill in the update key and the host name and run it manually for the first time to check if it works

Get certificates from Let's Encrypt

The acme.sh command line client supports the ddnss API out of the box (and many others).

Following the install guide https://github.com/acmesh-official/acme.sh#2-or-install-from-git
Use DDNSS.de API

   $ export DDNSS_Token="<update token>"
   $ acme.sh --issue --dns dns_ddnss -d <full ddnss domain> -d <ddnss subdomain>

Create a kubernets secret with the provided certificates (FYI)

The environment variable LE_WORKING_DIR is set by acme.sh somewhere

DOMAINBASE=<your domain>
kubectl -n yourns create secret tls certs-catamel --cert="$LE_WORKING_DIR/$DOMAINBASE.cer" --key="$LE_WORKING_DIR/$DOMAINBASE.key" --dry-run=client -o yaml | kubectl apply -f -

Ingress

See https://github.com/kubernetes/ingress-nginx/blob/master/docs/deploy/index.md

For SciCat there is a script

It forwards node ports to the outward facing network device and makes it persistent across reboots (assuming there is no load balancer available) as shown below:

latest="$(curl -s https://api.github.com/repos/kubernetes/ingress-nginx/releases | jq '[.[] | select(.prerelease == false and .tag_name[:4] == "cont" )] | .[0].tag_name' | tr -d '"')"
url="https://raw.githubusercontent.com/kubernetes/ingress-nginx/$latest/deploy/static/provider/baremetal/deploy.yaml"
echo "$0 using '$url'"

if [ "$1" != "clean" ];
then
    kubectl apply -f "$url"
    # change ingress-nginx service to known port numbers
    kubectl patch svc -n ingress-nginx ingress-nginx-controller --type=json --patch \
        '[{"op": "replace", "path": "/spec/ports/0/nodePort", "value":30080},
          {"op": "replace", "path": "/spec/ports/1/nodePort", "value":30443}]'
else # clean up
    timeout 5 kubectl delete -f "$url"
fi

Port forwarding 80 to 30080 (the ingress node port)

Inspired by https://www.karlrupp.net/en/computer/nat_tutorial
Add this to rc.local:

#!/bin/sh
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j DNAT --to-destination 158.101.164.220:30080
sudo iptables -t nat -A POSTROUTING -o eth0 -p tcp --dport 30080 -j MASQUERADE
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 443 -j DNAT --to-destination 158.101.164.220:30443
sudo iptables -t nat -A POSTROUTING -o eth0 -p tcp --dport 30443 -j MASQUERADE

Then enable rc.local service:

grep -qiF '[Install]' /usr/lib/systemd/system/rc-local.service || sudo sh -c "(echo '[Install]'; echo 'WantedBy=multi-user.target') >> /usr/lib/systemd/system/rc-local.service"
sudo systemctl enable rc-local

For ingress options see: https://kubernetes.github.io/ingress-nginx/deploy/baremetal/

NFS server (for persistent storage)

sudo apt install nfs-kernel-server
mkdir -p /nfs && chmod a+rwx /nfs
sudo su -c "echo '/nfs	10.0.9.0/24(rw,sync,no_subtree_check) 10.0.0.0/24(rw,sync,no_subtree_check)' >> /etc/exports"
sudo exportfs -a
sudo service nfs-kernel-server restart

Mounting NFS shares on nodes

sudo mkdir -p /nfs
sudo sh -c "echo '10.0.9.1:/nfs/  /nfs    nfs     vers=4,rw       0 0' >> /etc/fstab"

MongoDB with persistent storage

Basically it is:

kubectl apply -f definitions/mongo_pv_nfs.yaml
helm install local-mongodb bitnami/mongodb --namespace dev

For SciCat, there is a script

It does a bit more to make sure persistent storage works and it provides a cleanup routine.

The following output should be produced:

MongoDB(R) can be accessed on the following DNS name(s) and ports from within your cluster:

    local-mongodb.dev.svc.cluster.local

To get the root password run:

    export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace dev local-mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)

To connect to your database, create a MongoDB(R) client container:

    kubectl run --namespace dev local-mongodb-client --rm --tty -i --restart='Never' --env="MONGODB_ROOT_PASSWORD=$MONGODB_ROOT_PASSWORD" --image docker.io/bitnami/mongodb:4.4.4-debian-10-r0 --command -- bash

Then, run the following command:
    mongo admin --host "local-mongodb" --authenticationDatabase admin -u root -p $MONGODB_ROOT_PASSWORD

To connect to your database from outside the cluster execute the following commands:

    kubectl port-forward --namespace dev svc/local-mongodb 27017:27017 &
    mongo --host 127.0.0.1 --authenticationDatabase admin -p $MONGODB_ROOT_PASSWORD

A secured private local registry

Needed for building custom images of SciCat services from source. It is deployed in the k8s cluster using a helm chart twuni: https://github.com/twuni/docker-registry.helm

Using the registry over TLS with generally trusted certificates, e.g. by Let's Encrypt via the acme.sh client is strongly recommended. Otherwise, it might break in multiple places when clients refuse to access untrusted/insecure registry (helm, cri-o, …).

For SciCat there is a script

It configures

TLS
ingress
htpasswd for username&password access and
persistent storage settings accordingly.

It provides a cleanup routine for rollback too.

That's it - have fun!

Troubleshooting

Changed IP address

Light weight address updates

Update IP addresses in
- /etc/kubernetes/manifests/kube-apiserver.yaml and
- /etc/kubernetes/manifests/etcd.yaml
This can be handled by the DNS update script in /etc/dhcp/dhclient-exit-hooks.d/99_ddnss_update.
Restart kubelet: systemctl restart kubelet

Approve internal CSRs after kubelet was restarted:

for kubeletcsr in `kubectl -n kube-system get csr \
     | grep kubernetes.io/kubelet-serving \
     | awk '{ print $1 }'`;
 do
     kubectl certificate approve $kubeletcsr;
 done

Restart the ingress pod since it has to know about external IP address.

Full reinit

Fix cluster for changed IP address:
(From here kubernetes/kubeadm#338 (comment))

systemctl stop kubelet crio
killall kube-controller-manager kube-scheduler kube-apiserver conmon

cd /etc/

# backup old kubernetes data
mv kubernetes kubernetes-backup
mv /var/lib/kubelet /var/lib/kubelet-backup

# restore certificates
mkdir -p kubernetes
cp -r kubernetes-backup/pki kubernetes
rm kubernetes/pki/{apiserver.*,etcd/peer.*,etcd/server.*}

Do customization described in Init the master node and init the cluster again with:

sudo systemctl start crio
sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd --config="$KUBELET_CFG"

Additionally, the metrics server might cause flannel to hang. Fix CoreDNS config to resolve the host.

DiskPressure

For smaller root filesystems with used space reaching 85 % (reported by df -h) the node my run into a DiskPressure condition caused by rather conservative defaults for the free disk space required to be at least 15 %:

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

A prominent effect is that kubectl get pods -A shows many pods with status Evicted or ContainerStatusUnknown. The current limits can be viewed by exporting the kubelet configuration of the node:

kubectl proxy --port=8001 &
NODE_NAME="<your-node-name>"; curl -sSL "http://localhost:8001/api/v1/nodes/${NODE_NAME}/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' | less -S

A solution which acutally worked was to adjust the eviction threshold via kubelet paramters in /etc/default/kubelet:

KUBELET_EXTRA_ARGS="--eviction-hard='imagefs.available<2Gi,nodefs.available<2Gi,nodefs.inodesFree<20000'"

Followed by restarting kubelet by sudo service kubelet restart.

Misc. Snippets

Get shell access in a pod (if a shell is available)

kubectl -nyourns exec -it $(kubectl get po --all-namespaces | awk '/catamel/{print $2}') -- ash

Install NodeJs via NPM

sudo apt install npm

Distributed storage

https://www.gluster.org/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/
https://www.taste-of-it.de/glusterfs-mount-mittels-nfs/

ConfigMaps

https://stackoverflow.com/questions/54571185/how-to-patch-a-configmap-in-kubernetes
https://pabraham-devops.medium.com/mapping-kubernetes-configmap-to-read-write-folders-and-files-8a548c855817

From directory, https://phoenixnap.com/kb/kubernetes-configmap-create-and-use

kubectl create configmap test --from-file=/nfs/siteconfig/catamel/ --dry-run=client -o yaml | kubectl apply -f -

kubectl -nyourns patch cm catamel-dacat-api-server-dev -p "$(printf 'data:\n  datasources.json: |-\n%s' "$(jq '.' /nfs/siteconfig/catamel/datasources.json | sed 's/^/    /g')")"

On changes, the deployment needs to be updated:

https://blog.questionable.services/article/kubernetes-deployments-configmap-change/
https://helm.sh/docs/chart_template_guide/accessing_files/
helm/helm#3403 (comment)

Files

InstallingKubernetes.md

Latest commit

History

InstallingKubernetes.md

File metadata and controls

Walk-through for installing kubernetes dockerless

See also:

Cleaning up the initial system

Ubuntu minimal

Install utility packages

Update installed packages

Configure unattended-upgrades

Install & configure fail2ban

Configure networking and packet forwarding

Wireguard VPN for encrypted networking between nodes

TODO: A script for adding new client in server cfg and generate client cfg.

Server Config (wg0.conf) on Node 1

Client Config (wg0.conf) example on Node 2

How to find your public ip address oneliner

Update Wireguard config

Bring up the WireGuard interface at boot time

Attention:

Install kubernetes tools

Install CRI-O

Configuring CRI-O

Add existing private registry name to /etc/hosts [optional]

Use docker.io public registry only, avoid questions for unqualified image names

Start CRI-O

Init kubernetes

Some preparations

Init the master node

Setup the kubernetes config on a client

To let the master node run pods as well

Scale down the deployment for coredns

Setup cluster networking with flannel CNI

Upgrading flannel later

Troubleshoot missing /run/flannel/subnet.env

Troubleshoot unreliable connections from localhost failing >90% of the time

Check network settings (FYI)

Literature on network config ports needed (FYI)

Troubleshooting

Metrics, for kubectl top

Install the metrics server

Add machine hostname to coredns

Let the cluster issue its own certs

Get helm

NVIDIA GPU support

Install the package nvidia-container-runtime

Set cri-o hooks appropriately

Install the device plugin

Test GPU support with an example pod

A public name with certificates

Using https://www.ddnss.de

Get certificates from Let's Encrypt

Create a kubernets secret with the provided certificates (FYI)

Ingress

For SciCat there is a script

Port forwarding 80 to 30080 (the ingress node port)

NFS server (for persistent storage)

Mounting NFS shares on nodes

MongoDB with persistent storage

For SciCat, there is a script

The following output should be produced:

A secured private local registry

For SciCat there is a script

That's it - have fun!

Troubleshooting

Changed IP address

Light weight address updates

Full reinit

DiskPressure

Misc. Snippets

Get shell access in a pod (if a shell is available)

Install NodeJs via NPM

Distributed storage

ConfigMaps

On changes, the deployment needs to be updated:

Server Config (`wg0.conf`) on Node 1

Client Config (`wg0.conf`) example on Node 2

Troubleshoot missing `/run/flannel/subnet.env`

Install the package `nvidia-container-runtime`