Skip to content

Latest commit

 

History

History
520 lines (465 loc) · 18.3 KB

kube.org

File metadata and controls

520 lines (465 loc) · 18.3 KB

Install

General

Distro Specific

Debian 10 (Buster)

Minimal node installation, make sure to:

  • NOT create a swap partition
  • Configure static IP in /etc/network/interfaces
  • Configure domain and DNS in /etc/resolv.conf
  • Configure hostname in /etc/hostname
  • Add split DNS mapping in router

Shut up, Debian…

rm /etc/motd

Update the system and install utils we’ll need.

apt upgrade
apt install -y apt-transport-https ca-certificates software-properties-common curl gnupg2

Install the buster-backports repo and the latest kernel

apt-add-repository "deb [arch=amd64] https://deb.debian.org/debian buster-backports main"
apt -t buster-backports install -y linux-image-amd64
apt update

If using ceph, install the following

apt install -y lvm2

Install CRI-O

VERSION=1.20
curl -s https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_10/Release.key | apt-key add -
curl -s http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/Debian_10/Release.key | apt-key add -
apt-add-repository "deb [arch=amd64] https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_10/ /"
apt-add-repository "deb [arch=amd64] http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/Debian_10/ /"
unset VERSION
apt update
apt install -t buster-backports -y libseccomp-dev
apt install -y cri-o cri-o-runc
systemctl enable --now cri-o

Add the necessary kubelet flags for CRI-O to work

cat <<EOF > /etc/default/kubelet
KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m
EOF

Install the latest kubernetes. Version lock with apt-mark, since kubernetes updates are actually pretty complicated and we don’t want it happening every time we update the system.

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
apt-add-repository "deb [arch=amd64] http://apt.kubernetes.io/ kubernetes-xenial main"
apt update
# List versions with `apt list -a kubelet | head`
apt install -y kubelet=1.20.1-00 kubeadm=1.20.1-00 kubectl=1.20.1-00
apt-mark hold kubelet kubeadm kubectl

Configure iptables to correctly see bridged traffic (see Network Plugin Requirements).

modprobe br_netfilter
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system

Configuration

Bootstrap

Local Network

Make sure your subnet isn’t one of the defaults like 10.0.0.0/24 or 192.168.0.0/24. Use a random /16 in the 10.0.0.0 block, like 10.123.0.0/16. Otherwise, you’ll have issues VPNing into other networks that use the same subnet, because local IPs will conflict with remote IPs.

Cluster Bootstrap

Set up HAProxy

The main important issue is to do TCP forwarding and not HTTP. Here a stat page is enabled as well for those running a vanilla HAproxy.

listen stats
  bind    *:9000
  mode    http
  stats   enable
  stats   hide-version
  stats   uri       /stats
  stats   refresh   30s
  stats   realm     Haproxy\ Statistics
  stats   auth      Admin:Password

############## Configure HAProxy Secure Frontend #############
frontend k8s-api-https-proxy
    bind :443
    mode tcp
    tcp-request inspect-delay 5s
    tcp-request content accept if { req.ssl_hello_type 1 }
    default_backend k8s-api-https

############## Configure HAProxy SecureBackend #############
backend k8s-api-https
    balance roundrobin
    mode tcp
    option tcplog
    option tcp-check
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server k8s-api-1 192.168.1.101:6443 check
    server k8s-api-2 192.168.1.102:6443 check
    server k8s-api-3 192.168.1.103:6443 check

Bootstrap with Kubeadm

On the first master node, bootstrap the cluster using the HAProxy domain name we just created as the control plane endpoint.

Depending on the CNI plugin you use, you may need to set the pod network CIDR to a certain value (make sure it doesn’t conflict with your local subnet). The one below works for Cilium, but check the CNI install docs for everything you need to install another one.

kubeadm init --control-plane-endpoint=kube.i.foltz.io --pod-network-cidr=10.217.0.0/16 --upload-certs

Once the first master is set up, it will spit out a bunch of useful info you will need.

The admin.conf it generated stores certificate and authentication information for your cluster to communicate. You can copy it to your local workstation and entirely remote control the cluster without needing to ssh in and run kubectl commands manually on the master node.

The kubeadm join command it generates (the one for masters with --control-plane-endpoint) has the key to the secret that will allow kubeadm to automatically distribute the generated CA certificates to new nodes. They’ll automatically get removed in 2 hours, so you may need to do some other tomfoolery to join another node after that. Run this command to join every other master node to the cluster.

Once this is done, untaint the master role to allow pods to be scheduled on masters, which you’ll need if your cluster is made of only masters. Technically in production you’re discouraged from running pods on masters at all, let alone having your entire working cluster made of masters, but ehhh we’re not google.

kubectl taint nodes --all node-role.kubernetes.io/master-

Cilium

Once the cluster is set up, we can get to installing the CNI, the black magic networking glue between nodes and pods.

Install Cillium and connectivity test

kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml

Delete connectivity test after

kubectl delete -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml

If external-fqdn

MetalLB

Make sure BGP advertisement isn’t part of the routable subnet

Deploy MetalLB

kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

Create MetalLB config.yml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    peers:
    - peer-address: 10.16.0.1
      peer-asn: 64500
      my-asn: 64501
    address-pools:
    - name: default
      avoid-buggy-ips: true
      protocol: bgp
      addresses:
      - 10.18.0.0/16

Apply config

kubectl apply -f metallb.yml

Storage

Rook

Ceph

Using Cilium, you’ll need at least kernel 5.8 to use Ceph due to this issue: rook/rook#4752

Deploy the ceph operator

git clone --single-branch --branch release-1.5 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

Downgrade ceph from 15.2.8 to 15.2.7 if you get errors about insufficient LVM vg sizes. See https://tracker.ceph.com/issues/47758.

Deploy the cluster. This takes around 5 minutes to fully complete, depending on your setup.

kubectl create -f rook-ceph-cluster.yaml

Once everything is done you can launch up the dashboard by forwarding the ClusterIP over SSH. Log in with the username admin and the generated password obtained from the secret:

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
ssh -NL 8443:10.110.18.103:8443 root@pylon
xdg-open https://localhost:8443

Create the storage classes ceph-block and ceph-shared, which are meant for single pod ReadWriteOnce volumes and multi-pod shared ReadWriteMany volumes respectively.

kubectl create -f rook-ceph-block.yaml
kubectl create -f rook-ceph-shared.yaml

Optionally, create the test pods to make sure storage properly provisions and mounts.

kubectl create -f rook-ceph-block-test.yaml
kubectl create -f rook-ceph-shared-test.yaml
kubectl get pods
kubectl delete -f rook-ceph-block-test.yaml
kubectl delete -f rook-ceph-shared-test.yaml

Teardown instructions at https://rook.io/docs/rook/v1.5/ceph-teardown.html

EdgeFS (Deprecated)

git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/edgefs
kubectl create -f operator.yaml
kubectl create -f cluster.yaml

Enter the toolbox

kubectl exec -it -n rook-edgefs rook-edgefs-target-0 -c daemon -- env COLUMNS=$COLUMNS LINES=$LINES TERM=linux toolbox
efscli system init
efscli cluster create cluster
efscli tenant create cluster/foltik
efscli bucket create cluster/foltik/files
efscli service create nfs nfs
efscli service serve nfs cluster/foltik/files
efscli bucket create cluster/foltik/block
efscli service create iscsi iscsi
efscli service serve iscsi cluster/foltik/block
kubectl create -f https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csidriver.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csinodeinfo.yaml
kubectl create -f nfs.yaml
kubectl create secret generic edgefs-nfs-csi-driver-config --from-file=./edgefs-nfs-csi-driver-config.yaml
kubectl create -f edgefs-nfs-csi-driver.yaml
kubectl create -f storage-class.yaml

kubectl create secret generic edgefs-iscsi-csi-driver-config --from-file=./edgefs-iscsi-csi-driver-config.yaml
kubectl create -f edgefs-iscsi-csi-driver.yaml
kubectl create -f test/nginx.yaml

PVC access modes explanation https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Monitoring

Prometheus

We will set up prometheus-operator along with kube-prometheus to simplify setting up cluster wide logging.

To build the kube-prometheus configs, we need jsonnet-bundler and gojsontoyaml, which you can install with go’s package manager. Make sure you have $GOPATH/bin (defaults to $HOME/go) in your $PATH.

go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
go get github.com/brancz/gojsontoyaml

First, create a namespace for all this stuff to sit in.

kubectl create namespace monitoring

Service Mesh

Istio

TLS

Install cert-manager https://cert-manager.io/docs/installation/kubernetes/

kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager.yaml

Create a cluster issuer. Start with staging so you can test issuing, but later make a new one called letsencrypt and swap out the URL for the live URL.

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
  namespace: cert-manager
spec:
  acme:
    email: [email protected]
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret that will be created to store the letsencrypt account private key
      name: letsencrypt-key
  solvers:
  - dns01:
    cloudflare:
      # Cloudflare email
      email: [email protected]
      apiTokenSecretRef:
        # Secret storing cloudflare API token
        name: cloudflare-key
        key: apikey
  selector:
    dnsZones:
    # Your domain. This will match example.com and *.example.com
    - 'example.com'

Create the secrets. Make sure to use a scoped api TOKEN, not the global API KEY.

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-key
  namespace: cert-manager
type: Opaque
string:
  apikey: # Your API token base64 encoded

See configs/examples/tls for examples:

Gateway level TLS with SDS

In order for the ingressgateway to pick up the secret via Secret Discovery Service, the certificate needs to be in the istio-system namespace. It looks like this is going to change, hopefully.

Modify and deploy cert-system.yaml, and wait for the ACME challenge to complete. You can see the status with:

kubectl get certificate test-nginx-cert -n istio-system

Once it’s successfully issued, deploy app-http.yaml. Using your own domain and LoadBalancerIP, you should be able to get a response with:

curl -v -k -HHost:test.foltz.io --resolve test.foltz.io:80:10.17.0.1 https://test.foltz.io
Deploy level TLS with Mounts

You can also just mount the certificate secret directly into a Deployment with a secret file mount, and use TLS passthrough on the gateway to talk directly to a TLS secured backend. Note that in this case, the secret has to be in the same namespace as the deployment.

See cert.yaml and app.yaml for an example of how to do this.

Kiali

cat <<EOF > kiali_secret.yml
apiVersion: v1
kind: Secret
metadata:
  name: kiali
  namespace: istio-system
  labels:
    app: kiali
type: Opaque
data:
  username: $(read '?Kiali Username: ' uval && echo -n $uval | base64)
  passphrase: $(read -s "?Kiali Passphrase: " pval && echo -n $pval | base64)

Multiple Ingress Gateways

istio/istio#19263

Note: you need to manually add all custom TCP ports on an ingress gateway to the IngressGateway CRD

DNS

While it’s possible to automatically create/delete external DNS records with external-dns, including istio ingress gateways as a source, it picks up the LoadBalancerIP. Since we don’t have a ton of public IPv4s to hand out and we use BGP peered RFC1918 addresses instead, this makes it kind of useless.

Dashboards

Grafana

Ceph

Hubble

TODO: The below is irrelevant for Cilium =>1.18, it comes included in a quick.yaml file

git clone https://github.com/cilium/hubble.git
cd hubble/install/kubernetes

helm template hubble \
    --namespace kube-system \
    --set metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \
    --set ui.enabled=true \
> hubble.yml
kubectl apply -f hubble.yml

Identity

Keycloak

kubectl create -f https://raw.githubusercontent.com/keycloak/keycloak-quickstarts/latest/kubernetes-examples/keycloak.yaml

ORY Stack

Hydra

helm install -f hydra-config.yaml ory/hydra

Kratos

helm install

X-Treme Automagic Configuration

AKA foltz’s unportable installation shell script

Install

kubeadm init --control-plane-endpoint=kube.i.foltz.io --pod-network-cidr=10.217.0.0/16 --upload-certs
kubeadm join ...
scp root@obelisk:/etc/kubernetes/admin.conf ~/.kube/config
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml
kubectl delete -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/connectivity-check/connectivity-check.yaml
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey=(echo \"(openssl rand -base64 4)\")
kubectl apply -f metallb.yaml
kubectl apply -f hello-metallb.yaml
kubectl delete -f hello-metallb.yaml
istioctl manifest apply -f istio.yaml
kubectl label namespace default istio-injection=enabled
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.yaml
kubectl apply -f cloudflare.yaml -n cert-manager
kubectl apply -f letsencrypt-staging.yaml
kubectl apply -f letsencrypt.yaml

Reset

kubeadm reset
rm -rf /etc/cni/net.d/* ~/.kube/ /etc/kubernetes/ /var/lib/cni/ /opt/cni/ /var/lib/etcd
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
systemctl daemon-reload
systemctl restart kubelet

One-liners

Determine holdup for TERMINATING namespace

kubectl api-resources --verbs=list --namespaced -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -n $NAMESPACE

Open Rook Ceph Toolbox

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

Deep Dive

Networking

https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/ https://itnext.io/kubernetes-network-deep-dive-7492341e0ab5

Notes

OPNsense

Unbound

Local Zone Type

If you don’t want unknown hostnames to resolve to the router’s IP, change the Local Zone Type from transient to static. For some reason, transient caused issues resolving external FQDNs from within pods with cilium for me (foltik).

Troubleshooting

Good luck lol