Minimal node installation, make sure to:
- NOT create a swap partition
- Configure static IP in
/etc/network/interfaces
- Configure domain and DNS in
/etc/resolv.conf
- Configure hostname in
/etc/hostname
- Add split DNS mapping in router
Shut up, Debian…
rm /etc/motd
Update the system and install utils we’ll need.
apt upgrade
apt install -y apt-transport-https ca-certificates software-properties-common curl gnupg2
Install the buster-backports repo and the latest kernel
apt-add-repository "deb [arch=amd64] https://deb.debian.org/debian buster-backports main"
apt -t buster-backports install -y linux-image-amd64
apt update
If using ceph, install the following
apt install -y lvm2
Install CRI-O
VERSION=1.20
curl -s https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_10/Release.key | apt-key add -
curl -s http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/Debian_10/Release.key | apt-key add -
apt-add-repository "deb [arch=amd64] https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_10/ /"
apt-add-repository "deb [arch=amd64] http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/Debian_10/ /"
unset VERSION
apt update
apt install -t buster-backports -y libseccomp-dev
apt install -y cri-o cri-o-runc
systemctl enable --now cri-o
Add the necessary kubelet flags for CRI-O to work
cat <<EOF > /etc/default/kubelet
KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m
EOF
Install the latest kubernetes. Version lock with apt-mark
, since
kubernetes updates are actually pretty complicated and we don’t want
it happening every time we update the system.
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
apt-add-repository "deb [arch=amd64] http://apt.kubernetes.io/ kubernetes-xenial main"
apt update
# List versions with `apt list -a kubelet | head`
apt install -y kubelet=1.20.1-00 kubeadm=1.20.1-00 kubectl=1.20.1-00
apt-mark hold kubelet kubeadm kubectl
Configure iptables to correctly see bridged traffic (see Network Plugin Requirements).
modprobe br_netfilter
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
Make sure your subnet isn’t one of the defaults like 10.0.0.0/24
or
192.168.0.0/24
. Use a random /16
in the 10.0.0.0
block, like
10.123.0.0/16
. Otherwise, you’ll have issues VPNing into other
networks that use the same subnet, because local IPs will conflict
with remote IPs.
The main important issue is to do TCP forwarding and not HTTP. Here a stat page is enabled as well for those running a vanilla HAproxy.
listen stats
bind *:9000
mode http
stats enable
stats hide-version
stats uri /stats
stats refresh 30s
stats realm Haproxy\ Statistics
stats auth Admin:Password
############## Configure HAProxy Secure Frontend #############
frontend k8s-api-https-proxy
bind :443
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if { req.ssl_hello_type 1 }
default_backend k8s-api-https
############## Configure HAProxy SecureBackend #############
backend k8s-api-https
balance roundrobin
mode tcp
option tcplog
option tcp-check
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server k8s-api-1 192.168.1.101:6443 check
server k8s-api-2 192.168.1.102:6443 check
server k8s-api-3 192.168.1.103:6443 check
On the first master node, bootstrap the cluster using the HAProxy domain name we just created as the control plane endpoint.
Depending on the CNI plugin you use, you may need to set the pod network CIDR to a certain value (make sure it doesn’t conflict with your local subnet). The one below works for Cilium, but check the CNI install docs for everything you need to install another one.
kubeadm init --control-plane-endpoint=kube.i.foltz.io --pod-network-cidr=10.217.0.0/16 --upload-certs
Once the first master is set up, it will spit out a bunch of useful info you will need.
The admin.conf it generated stores certificate and authentication information for your cluster to communicate. You can copy it to your local workstation and entirely remote control the cluster without needing to ssh in and run kubectl commands manually on the master node.
The kubeadm join
command it generates (the one for masters with
--control-plane-endpoint
) has the key to the secret that will allow
kubeadm to automatically distribute the generated CA certificates to
new nodes. They’ll automatically get removed in 2 hours, so you may
need to do some other tomfoolery to join another node after that. Run
this command to join every other master node to the cluster.
Once this is done, untaint the master role to allow pods to be scheduled on masters, which you’ll need if your cluster is made of only masters. Technically in production you’re discouraged from running pods on masters at all, let alone having your entire working cluster made of masters, but ehhh we’re not google.
kubectl taint nodes --all node-role.kubernetes.io/master-
Once the cluster is set up, we can get to installing the CNI, the black magic networking glue between nodes and pods.
Install Cillium and connectivity test
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml
Delete connectivity test after
kubectl delete -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml
If external-fqdn
Make sure BGP advertisement isn’t part of the routable subnet
Deploy MetalLB
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/namespace.yaml kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/metallb.yaml kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
Create MetalLB config.yml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
peers:
- peer-address: 10.16.0.1
peer-asn: 64500
my-asn: 64501
address-pools:
- name: default
avoid-buggy-ips: true
protocol: bgp
addresses:
- 10.18.0.0/16
Apply config
kubectl apply -f metallb.yml
Using Cilium, you’ll need at least kernel 5.8 to use Ceph due to this issue: rook/rook#4752
Deploy the ceph operator
git clone --single-branch --branch release-1.5 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
Downgrade ceph from 15.2.8 to 15.2.7 if you get errors about insufficient LVM vg sizes. See https://tracker.ceph.com/issues/47758.
Deploy the cluster. This takes around 5 minutes to fully complete, depending on your setup.
kubectl create -f rook-ceph-cluster.yaml
Once everything is done you can launch up the dashboard by forwarding
the ClusterIP over SSH. Log in with the username admin
and the
generated password obtained from the secret:
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo ssh -NL 8443:10.110.18.103:8443 root@pylon xdg-open https://localhost:8443
Create the storage classes ceph-block
and ceph-shared
, which are
meant for single pod ReadWriteOnce
volumes and multi-pod shared
ReadWriteMany
volumes respectively.
kubectl create -f rook-ceph-block.yaml
kubectl create -f rook-ceph-shared.yaml
Optionally, create the test pods to make sure storage properly provisions and mounts.
kubectl create -f rook-ceph-block-test.yaml
kubectl create -f rook-ceph-shared-test.yaml
kubectl get pods
kubectl delete -f rook-ceph-block-test.yaml
kubectl delete -f rook-ceph-shared-test.yaml
Teardown instructions at https://rook.io/docs/rook/v1.5/ceph-teardown.html
git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/edgefs
kubectl create -f operator.yaml
kubectl create -f cluster.yaml
Enter the toolbox
kubectl exec -it -n rook-edgefs rook-edgefs-target-0 -c daemon -- env COLUMNS=$COLUMNS LINES=$LINES TERM=linux toolbox
efscli system init
efscli cluster create cluster
efscli tenant create cluster/foltik
efscli bucket create cluster/foltik/files
efscli service create nfs nfs
efscli service serve nfs cluster/foltik/files
efscli bucket create cluster/foltik/block
efscli service create iscsi iscsi
efscli service serve iscsi cluster/foltik/block
kubectl create -f https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csidriver.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csinodeinfo.yaml
kubectl create -f nfs.yaml
kubectl create secret generic edgefs-nfs-csi-driver-config --from-file=./edgefs-nfs-csi-driver-config.yaml
kubectl create -f edgefs-nfs-csi-driver.yaml
kubectl create -f storage-class.yaml
kubectl create secret generic edgefs-iscsi-csi-driver-config --from-file=./edgefs-iscsi-csi-driver-config.yaml
kubectl create -f edgefs-iscsi-csi-driver.yaml
kubectl create -f test/nginx.yaml
PVC access modes explanation https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
We will set up prometheus-operator along with kube-prometheus to simplify setting up cluster wide logging.
To build the kube-prometheus configs, we need jsonnet-bundler
and
gojsontoyaml
, which you can install with go’s package manager. Make
sure you have $GOPATH/bin
(defaults to $HOME/go
) in your $PATH
.
go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
go get github.com/brancz/gojsontoyaml
First, create a namespace for all this stuff to sit in.
kubectl create namespace monitoring
Install cert-manager https://cert-manager.io/docs/installation/kubernetes/
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager.yaml
Create a cluster issuer. Start with staging so you can test issuing, but later make a new one called letsencrypt and swap out the URL for the live URL.
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
namespace: cert-manager
spec:
acme:
email: [email protected]
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret that will be created to store the letsencrypt account private key
name: letsencrypt-key
solvers:
- dns01:
cloudflare:
# Cloudflare email
email: [email protected]
apiTokenSecretRef:
# Secret storing cloudflare API token
name: cloudflare-key
key: apikey
selector:
dnsZones:
# Your domain. This will match example.com and *.example.com
- 'example.com'
Create the secrets. Make sure to use a scoped api TOKEN, not the global API KEY.
apiVersion: v1
kind: Secret
metadata:
name: cloudflare-key
namespace: cert-manager
type: Opaque
string:
apikey: # Your API token base64 encoded
See configs/examples/tls for examples:
In order for the ingressgateway to pick up the secret via Secret
Discovery Service, the certificate needs to be in the istio-system
namespace. It looks like this is going to change, hopefully.
Modify and deploy cert-system.yaml, and wait for the ACME challenge to complete. You can see the status with:
kubectl get certificate test-nginx-cert -n istio-system
Once it’s successfully issued, deploy app-http.yaml. Using your own domain and LoadBalancerIP, you should be able to get a response with:
curl -v -k -HHost:test.foltz.io --resolve test.foltz.io:80:10.17.0.1 https://test.foltz.io
You can also just mount the certificate secret directly into a Deployment with a secret file mount, and use TLS passthrough on the gateway to talk directly to a TLS secured backend. Note that in this case, the secret has to be in the same namespace as the deployment.
See cert.yaml and app.yaml for an example of how to do this.
cat <<EOF > kiali_secret.yml
apiVersion: v1
kind: Secret
metadata:
name: kiali
namespace: istio-system
labels:
app: kiali
type: Opaque
data:
username: $(read '?Kiali Username: ' uval && echo -n $uval | base64)
passphrase: $(read -s "?Kiali Passphrase: " pval && echo -n $pval | base64)
Note: you need to manually add all custom TCP ports on an ingress gateway to the IngressGateway CRD
While it’s possible to automatically create/delete external DNS records with external-dns, including istio ingress gateways as a source, it picks up the LoadBalancerIP. Since we don’t have a ton of public IPv4s to hand out and we use BGP peered RFC1918 addresses instead, this makes it kind of useless.
TODO: The below is irrelevant for Cilium =>1.18, it comes included in a quick.yaml file
git clone https://github.com/cilium/hubble.git cd hubble/install/kubernetes helm template hubble \ --namespace kube-system \ --set metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \ --set ui.enabled=true \ > hubble.yml kubectl apply -f hubble.yml
kubectl create -f https://raw.githubusercontent.com/keycloak/keycloak-quickstarts/latest/kubernetes-examples/keycloak.yaml
helm install -f hydra-config.yaml ory/hydra
helm install
AKA foltz’s unportable installation shell script
kubeadm init --control-plane-endpoint=kube.i.foltz.io --pod-network-cidr=10.217.0.0/16 --upload-certs
kubeadm join ...
scp root@obelisk:/etc/kubernetes/admin.conf ~/.kube/config
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml
kubectl delete -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey=(echo \"(openssl rand -base64 4)\")
kubectl apply -f metallb.yaml
kubectl apply -f hello-metallb.yaml
kubectl delete -f hello-metallb.yaml
istioctl manifest apply -f istio.yaml
kubectl label namespace default istio-injection=enabled
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.yaml
kubectl apply -f cloudflare.yaml -n cert-manager
kubectl apply -f letsencrypt-staging.yaml
kubectl apply -f letsencrypt.yaml
kubeadm reset
rm -rf /etc/cni/net.d/* ~/.kube/ /etc/kubernetes/ /var/lib/cni/ /opt/cni/ /var/lib/etcd
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
systemctl daemon-reload
systemctl restart kubelet
kubectl api-resources --verbs=list --namespaced -o name \
| xargs -n 1 kubectl get --show-kind --ignore-not-found -n $NAMESPACE
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/ https://itnext.io/kubernetes-network-deep-dive-7492341e0ab5
If you don’t want unknown hostnames to resolve to the router’s IP,
change the Local Zone Type from transient
to static
. For some
reason, transient
caused issues resolving external FQDNs from within
pods with cilium for me (foltik).
Good luck lol