This should work for the Ubuntu Server LTS edition 22+ and was tested on free-tier Orcale Cloud VMs, inspired by https://matrix.org/docs/guides/free-small-matrix-server
- https://kubernetes.io/docs/reference/kubectl/cheatsheet/
- https://github.com/kelseyhightower/kubernetes-the-hard-way
- https://blog.quickbird.uk/domesticating-kubernetes-d49c178ebc41
Remove some unwanted, unused packages. Possibly, some packages might not be installed, just in case.
sudo apt-get -y purge netfilter-persistent iptables-persistent
sudo snap remove oracle-cloud-agent
sudo apt-get -y purge snap snapd open-iscsi lxd
If the Ubuntu minimal image was used, run unminimize
to get manpages back.
sudo apt-get update
sudo apt-get install -y vim less screen git bridge-utils net-tools inetutils-ping psmisc software-properties-common jq
sudo apt-get dist-upgrade -y
sudo apt-get --purge autoremove -y
sudo apt-get clean
To get system package updates automatically/unattended.
sudo sh -c "echo 'Unattended-Upgrade::Origins-Pattern { \"origin=*\"; };' >> /etc/apt/apt.conf.d/50unattended-upgrades"
For protecting your SSH server from brute forcing system passwords. See also: https://linuxize.com/post/install-configure-fail2ban-on-ubuntu-20-04/
sudo apt-get install fail2ban
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
Adjust jail.local
your needs, especially the [sshd]
section.
SYSCTLCFG=/etc/sysctl.conf
sudo sh -c "grep -v bridge-nf-call-iptables $SYSCTLCFG | echo 'net.bridge.bridge-nf-call-iptables = 1' >> $SYSCTLCFG"
sudo sh -c "echo net.ipv4.ip_forward=1 >> $SYSCTLCFG"
sudo sysctl --system
sudo modprobe overlay
sudo modprobe br_netfilter
sudo sh -c 'echo overlay >> /etc/modules'
sudo sh -c 'echo br_netfilter >> /etc/modules'
See also: https://linuxize.com/post/how-to-set-up-wireguard-vpn-on-ubuntu-20-04/
sudo apt install -y wireguard
wg genkey | sudo tee /etc/wireguard/privatekey | wg pubkey | sudo tee /etc/wireguard/publickey
sudo chmod 600 /etc/wireguard/privatekey
[Interface]
Address = 10.0.9.1/24
ListenPort = 51820
PrivateKey = ***
# Node 2
[Peer]
PublicKey = ***
AllowedIPs = 10.0.9.2/32, <local node2 IP>/32
# Node 3
[Peer]
PublicKey = ***
AllowedIPs = 10.0.9.3/32, <local node3 IP>/32
[Interface]
PrivateKey = ***
Address = 10.0.9.2/24
[Peer]
PublicKey = ***
Endpoint = <public IP of node1>:51820
AllowedIPs = 10.0.9.0/24, <local node2 IP>/32, <local node3 IP>/32
IPADDRESS=$(curl -s http://checkip.dyndns.org | python3 -c 'import sys; data=sys.stdin.readline(); import xml.etree.ElementTree as ET; print(ET.fromstring(data).find("body").text.split(":")[-1].strip())')
sudo chmod 600 /etc/wireguard/wg0.conf
sudo sh -c 'wg-quick down wg0; wg-quick up wg0'
sudo systemctl enable wg-quick@wg0
Make sure the wireguard port used here (51820) is accessible from the outside on each node, check firewall settings if any cloud provider is used.
For wireguard debugging, see https://serverfault.com/a/1020299
See also: https://linoxide.com/containers/install-kubernetes-on-ubuntu/
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt-get install -y kubectl kubeadm kubelet kubernetes-cni
See also: https://gist.github.com/ruanbekker/38a38aea5f325f7fa4a19e795ef4f0d0
Add software sources for CRI-O and buildah first:
See also:
- https://computingforgeeks.com/install-cri-o-container-runtime-on-ubuntu-linux/
- https://linoxide.com/containers/install-kubernetes-on-ubuntu/
- https://github.com/cri-o/cri-o/blob/master/tutorials/kubeadm.md
source /etc/os-release
URL="http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable"
KUBEVER="$(kubeadm version -o short | grep -o '[0-9]\.[0-9]\+')"
sudo sh -c "(echo 'deb $URL/x${NAME}_${VERSION_ID}/ /'; echo 'deb $URL:/cri-o:/$KUBEVER/x${NAME}_${VERSION_ID}/ /') > /etc/apt/sources.list.d/cri-o_stable.list"
curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/x${NAME}_${VERSION_ID}/Release.key | sudo apt-key add -
sudo apt update
Install CRI-O:
sudo apt install cri-o cri-o-runc buildah fuse-overlayfs
TMPFN=$(mktemp)
cat << EOF > $TMPFN
[crio.runtime]
conmon = "$(which conmon)"
EOF
sudo mv $TMPFN /etc/crio/crio.conf.d/99-custom.conf
grep -q registry /etc/hosts || sudo sh -c "echo '10.0.9.1 registry' >> /etc/hosts"
sudo sed -i -e '/unqualified-search-registries/cunqualified-search-registries = ["docker.io",]' /etc/containers/registries.conf
sudo systemctl daemon-reload && sudo systemctl enable crio && sudo systemctl start crio && sudo systemctl status crio
sudo kubeadm config images pull
swapoff /swap # if any, just in case
Literature for k8s networking, see: https://kubernetes.io/docs/concepts/cluster-administration/networking/
Set up k8s init config:
- let it use the cri-o socket
- different IP (that of the VPN) for the API server
- set the pod network range (pod-network-cidr, the network of the pods which must not overlap with any host networks/devices), has to be given for the CNI plugin to pick it up (here: flannel)
- generate a new token, which is not done automatically once a cfg file is provided
KUBELET_CFG=/etc/kubernetes/custom_kubelet.conf
(kubeadm config print init-defaults --component-configs=KubeletConfiguration \
| sed -e '/ClusterConfiguration/{:a;n;/networking:/{a\ podSubnet: 10.244.0.0/16' -e'};ba' -e '}' \
| sed -e '/bootstrapTokens/{:a;n;/\(token:\s*\)/d;ba' -e '}' \
| sed -e '/nodeRegistration:/{:a;n;/\(name:\s*\)/d;ba' -e '}' \
| sed -e "/controllerManager:/acontrolPlaneEndpoint: $(hostname -f)" \
| sed -e '/localAPIEndpoint:/,/bindPort:/d' \
| sed -e '/cgroupDriver:/aserverTLSBootstrap: true' \
| sed -e 's#\(criSocket:\s*\).*$#\1unix:///var/run/crio/crio.sock#'\
| sed -e '/eviction/aevictionHard:\n memory.available: "200Mi"\n nodefs.available: "2Gi"\n nodefs.inodesFree: "20000"\n imagefs.available: "2Gi"'\
| sed -e 's#^\(containerRuntimeEndpoint:\).*$#\1 "unix:///var/run/crio/crio.sock"#'\
) > "$KUBELET_CFG"
sudo kubeadm init --ignore-preflight-errors=Mem,Swap --config="$KUBELET_CFG"
For problems with missing pod network, see: flannel-io/flannel#728 (comment)
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.9.1:6443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
And to let the coredns deployment start on the master node.
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Note: Documentation on taints and tolerations
kubectl scale deployments.apps -n kube-system coredns --replicas=1
See https://github.com/flannel-io/flannel
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
or with Helm:
helm repo add flannel https://flannel-io.github.io/flannel/
helm install flannel --set podCidr="10.244.0.0/16" --namespace kube-flannel flannel/flannel
See https://github.com/flannel-io/flannel/blob/master/Documentation/upgrade.md
In case kube-system pods fail to start with the error:
loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
Create that file with the correct subnet:
echo -e "FLANNEL_NETWORK=10.244.0.0/16\nFLANNEL_SUBNET=10.244.0.1/24\nFLANNEL_MTU=1450\nFLANNEL_IPMASQ=true" | sudo tee /run/flannel/subnet.env
(Found here: https://stackoverflow.com/a/75395978)
This is a problem for a local registry server in the same cluster. Testing with curl this error looks like this most of the time:
$ curl -sSl "https://$baseurl/v2/_catalog"
curl: (7) Failed to connect to img.**.ddnss.de port 443 after 2 ms: Couldn't connect to server
There are no errors logged by ingress and it looks more like caused by the underlying networking, which is flannel here. By looking at the iptables rules set up by kubernetes, it can be seen that some rules by flannel use the node name.
Chain FLANNEL-FWD (1 references)
target prot opt source destination
ACCEPT all -- ${HOSTNAME}/16 anywhere /* flanneld forward */
ACCEPT all -- anywhere ${HOSTNAME}/16 /* flanneld forward */
Assuming this config has a cluster on the open net in mind, presumably, this is supposed to resolve to the external IP address. Thus, having an entry in /etc/hosts
pointing the node name to 127.0.1.1 (which is good practice for setting the machine hostname) render these flannel rules problematic.
However, removing the entry from /etc/hosts
instantaneously resolves the connection failures.
Get service-cluster-cidr:
(https://stackoverflow.com/a/61685899)
echo '{"apiVersion":"v1","kind":"Service","metadata":{"name":"tst"},"spec":{"clusterIP":"1.1.1.1","ports":[{"port":443}]}}' | kubectl apply -f - 2>&1 | sed 's/.*valid IPs is //'
Get Services IPs range:
kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
Get Pods IPs range:
kubectl cluster-info dump | grep -m 1 cluster-cidr
- https://stackoverflow.com/questions/39293441/needed-ports-for-kubernetes-cluster
- https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#check-required-ports
- https://coreos.com/flannel/docs/latest/flannel-config.html#firewall
- https://serverfault.com/questions/1040893/vpn-network-and-kubernetes-clusters
failed to get cgroup stats for "/system.slice/kubelet.service"
Adding a file in /etc/systemd/system/kubelet.service.d/11-cgroups.conf
with:
[Service]
CPUAccounting=true
MemoryAccounting=true
activate by calling:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
wget -O metrics.yaml https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl apply -f metrics.yaml
Add the hostname to coredns name resolution for the metrics server to reolv it properly:
kubectl edit cm coredns -n kube-system
Add below the health section and before the ready keyword a hosts entry (not ideal, but works):
hosts {
<outside ip address> <hostname>.cluster.local
fallthrough
}
Reread the config file:
kubectl rollout restart -n kube-system deployment/coredns
From here: kubernetes-sigs/metrics-server#196 (comment)
Append serverTLSBootstrap: true
in the kubelet: section of the kubelet-config ConfigMap and save:
kubectl -n kube-system edit configmap kubelet-config
On each node, add serverTLSBootstrap to the kubelet config (as root typically):
echo 'serverTLSBootstrap: true' >> /var/lib/kubelet/config.yaml
sudo systemctl restart kubelet
Approve all signing requests (CSR) programmatically:
for kubeletcsr in `kubectl -n kube-system get csr \
| grep kubernetes.io/kubelet-serving \
| awk '{ print $1 }'`;
do
kubectl certificate approve $kubeletcsr;
done
The metrics server should work now:
kubectl top pods --all-namespaces
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get-helm-3 | bash
helm repo add k8s-at-home https://k8s-at-home.com/charts/
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
As described here: NVIDIA/nvidia-docker#1427 (comment) NVIDIAs container runtime needs to be installed but without using docker, CRI-O will be configured accordingly below: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian
# sources for NVIDIA container toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
sudo mkdir -p /usr/share/containers/oci/hooks.d
sudo bash -c '
cat > /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json << EOF
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-toolkit",
"args": ["nvidia-container-toolkit", "prestart"]
},
"when": {
"always": true,
"commands": [".*"]
},
"stages": ["prestart"]
}
EOF
'
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml
Create a new file pod1.yaml
with the following contents:
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
restartPolicy: OnFailure
containers:
- image: nvcr.io/nvidia/cuda:11.0-base
name: pod1-ctr
command: ["sleep"]
args: ["100000"]
resources:
limits:
nvidia.com/gpu: 1
Create the pod by calling:
kubectl create -f test.yaml
Check GPU support by running nvidia-smi
in the pod:
kubectl exec -it pod1 -- nvidia-smi
Using https://www.ddnss.de
- Create an account
- Create a new domain, wildcard capable
- Put the update script into /etc/dhcp/dhclient-exit-hooks.d/99_ddnss_update
(from https://www.ddnss.de/info.php -> 'CRON & Bash Script')
https://gist.github.com/ibressler/53ea52c88392831b615d65091281dc38 - Fill in the update key and the host name and run it manually for the first time to check if it works
The acme.sh command line client supports the ddnss API out of the box (and many others).
- Following the install guide https://github.com/acmesh-official/acme.sh#2-or-install-from-git
- Use DDNSS.de API
$ export DDNSS_Token="<update token>"
$ acme.sh --issue --dns dns_ddnss -d <full ddnss domain> -d <ddnss subdomain>
The environment variable LE_WORKING_DIR is set by acme.sh
somewhere
DOMAINBASE=<your domain>
kubectl -n yourns create secret tls certs-catamel --cert="$LE_WORKING_DIR/$DOMAINBASE.cer" --key="$LE_WORKING_DIR/$DOMAINBASE.key" --dry-run=client -o yaml | kubectl apply -f -
See https://github.com/kubernetes/ingress-nginx/blob/master/docs/deploy/index.md
For SciCat there is a script
It forwards node ports to the outward facing network device and makes it persistent across reboots (assuming there is no load balancer available) as shown below:
latest="$(curl -s https://api.github.com/repos/kubernetes/ingress-nginx/releases | jq '[.[] | select(.prerelease == false and .tag_name[:4] == "cont" )] | .[0].tag_name' | tr -d '"')"
url="https://raw.githubusercontent.com/kubernetes/ingress-nginx/$latest/deploy/static/provider/baremetal/deploy.yaml"
echo "$0 using '$url'"
if [ "$1" != "clean" ];
then
kubectl apply -f "$url"
# change ingress-nginx service to known port numbers
kubectl patch svc -n ingress-nginx ingress-nginx-controller --type=json --patch \
'[{"op": "replace", "path": "/spec/ports/0/nodePort", "value":30080},
{"op": "replace", "path": "/spec/ports/1/nodePort", "value":30443}]'
else # clean up
timeout 5 kubectl delete -f "$url"
fi
Inspired by https://www.karlrupp.net/en/computer/nat_tutorial
Add this to rc.local:
#!/bin/sh
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j DNAT --to-destination 158.101.164.220:30080
sudo iptables -t nat -A POSTROUTING -o eth0 -p tcp --dport 30080 -j MASQUERADE
sudo iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 443 -j DNAT --to-destination 158.101.164.220:30443
sudo iptables -t nat -A POSTROUTING -o eth0 -p tcp --dport 30443 -j MASQUERADE
Then enable rc.local service:
grep -qiF '[Install]' /usr/lib/systemd/system/rc-local.service || sudo sh -c "(echo '[Install]'; echo 'WantedBy=multi-user.target') >> /usr/lib/systemd/system/rc-local.service"
sudo systemctl enable rc-local
For ingress options see: https://kubernetes.github.io/ingress-nginx/deploy/baremetal/
sudo apt install nfs-kernel-server
mkdir -p /nfs && chmod a+rwx /nfs
sudo su -c "echo '/nfs 10.0.9.0/24(rw,sync,no_subtree_check) 10.0.0.0/24(rw,sync,no_subtree_check)' >> /etc/exports"
sudo exportfs -a
sudo service nfs-kernel-server restart
sudo mkdir -p /nfs
sudo sh -c "echo '10.0.9.1:/nfs/ /nfs nfs vers=4,rw 0 0' >> /etc/fstab"
See also: https://vocon-it.com/2018/12/10/kubernetes-4-persistent-volumes-hello-world/
Basically it is:
kubectl apply -f definitions/mongo_pv_nfs.yaml
helm install local-mongodb bitnami/mongodb --namespace dev
For SciCat, there is a script
It does a bit more to make sure persistent storage works and it provides a cleanup routine.
MongoDB(R) can be accessed on the following DNS name(s) and ports from within your cluster:
local-mongodb.dev.svc.cluster.local
To get the root password run:
export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace dev local-mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)
To connect to your database, create a MongoDB(R) client container:
kubectl run --namespace dev local-mongodb-client --rm --tty -i --restart='Never' --env="MONGODB_ROOT_PASSWORD=$MONGODB_ROOT_PASSWORD" --image docker.io/bitnami/mongodb:4.4.4-debian-10-r0 --command -- bash
Then, run the following command:
mongo admin --host "local-mongodb" --authenticationDatabase admin -u root -p $MONGODB_ROOT_PASSWORD
To connect to your database from outside the cluster execute the following commands:
kubectl port-forward --namespace dev svc/local-mongodb 27017:27017 &
mongo --host 127.0.0.1 --authenticationDatabase admin -p $MONGODB_ROOT_PASSWORD
Needed for building custom images of SciCat services from source. It is deployed in the k8s cluster using a helm chart twuni: https://github.com/twuni/docker-registry.helm
Using the registry over TLS with generally trusted certificates, e.g. by Let's Encrypt via the acme.sh client is strongly recommended. Otherwise, it might break in multiple places when clients refuse to access untrusted/insecure registry (helm, cri-o, …).
For SciCat there is a script
It configures
- TLS
- ingress
- htpasswd for username&password access and
- persistent storage settings accordingly.
It provides a cleanup routine for rollback too.
-
Update IP addresses in
/etc/kubernetes/manifests/kube-apiserver.yaml
and/etc/kubernetes/manifests/etcd.yaml
This can be handled by the DNS update script in
/etc/dhcp/dhclient-exit-hooks.d/99_ddnss_update
. -
Restart kubelet:
systemctl restart kubelet
-
Approve internal CSRs after kubelet was restarted:
for kubeletcsr in `kubectl -n kube-system get csr \ | grep kubernetes.io/kubelet-serving \ | awk '{ print $1 }'`; do kubectl certificate approve $kubeletcsr; done
-
Restart the ingress pod since it has to know about external IP address.
Fix cluster for changed IP address:
(From here kubernetes/kubeadm#338 (comment))
systemctl stop kubelet crio
killall kube-controller-manager kube-scheduler kube-apiserver conmon
cd /etc/
# backup old kubernetes data
mv kubernetes kubernetes-backup
mv /var/lib/kubelet /var/lib/kubelet-backup
# restore certificates
mkdir -p kubernetes
cp -r kubernetes-backup/pki kubernetes
rm kubernetes/pki/{apiserver.*,etcd/peer.*,etcd/server.*}
Do customization described in Init the master node and init the cluster again with:
sudo systemctl start crio
sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd --config="$KUBELET_CFG"
Additionally, the metrics server might cause flannel to hang. Fix CoreDNS config to resolve the host.
For smaller root filesystems with used space reaching 85 % (reported by df -h
) the node my run into a DiskPressure
condition caused by rather conservative defaults for the free disk space required to be at least 15 %:
A prominent effect is that kubectl get pods -A
shows many pods with status Evicted
or ContainerStatusUnknown
. The current limits can be viewed by exporting the kubelet configuration of the node:
kubectl proxy --port=8001 &
NODE_NAME="<your-node-name>"; curl -sSL "http://localhost:8001/api/v1/nodes/${NODE_NAME}/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' | less -S
A solution which acutally worked was to adjust the eviction threshold via kubelet paramters in /etc/default/kubelet
:
KUBELET_EXTRA_ARGS="--eviction-hard='imagefs.available<2Gi,nodefs.available<2Gi,nodefs.inodesFree<20000'"
Followed by restarting kubelet by sudo service kubelet restart
.
kubectl -nyourns exec -it $(kubectl get po --all-namespaces | awk '/catamel/{print $2}') -- ash
sudo apt install npm
- https://www.gluster.org/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
- https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/
- https://www.taste-of-it.de/glusterfs-mount-mittels-nfs/
- https://stackoverflow.com/questions/54571185/how-to-patch-a-configmap-in-kubernetes
- https://pabraham-devops.medium.com/mapping-kubernetes-configmap-to-read-write-folders-and-files-8a548c855817
From directory, https://phoenixnap.com/kb/kubernetes-configmap-create-and-use
kubectl create configmap test --from-file=/nfs/siteconfig/catamel/ --dry-run=client -o yaml | kubectl apply -f -
kubectl -nyourns patch cm catamel-dacat-api-server-dev -p "$(printf 'data:\n datasources.json: |-\n%s' "$(jq '.' /nfs/siteconfig/catamel/datasources.json | sed 's/^/ /g')")"