Skip to content

Latest commit

 

History

History
631 lines (452 loc) · 20.3 KB

4.cluster_architecture_installation_&_configuration.md

File metadata and controls

631 lines (452 loc) · 20.3 KB

Cluster Architecture Installation & Configuration

Summary

1. Manage role based access control (RBAC)

RBAC

Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within your organization.

The user needs a certificate during the process of authentication to access the Kubernetes.

A User is just a set of client certificates

RBAC API Objects

  • Permissions are additive, there are no deny rules.

If not specifyied in the Object, you cannot do.

  • A Kubernetes object always has to be either namespaced or not.

Role

  • Configures Permissions in a specific namespace(whenever you create a role, specify the namespace).

Example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

ClusterRole

  • Configures a set os permissions without being attached to a specific namespace.

Because they are cluster-scoped, you can also use them to grant access to:

  • cluster-scoped resources(like nodes)
  • non-resources endpoints(like /healthz)
  • namespaced resources across all namespaces (like Pods)

Example:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  # "namespace" omitted since ClusterRoles are not namespaced
  name: secret-reader
rules:
- apiGroups: [""]
  #
  # at the HTTP level, the name of the resource for accessing Secret
  # objects is "secrets"
  resources: ["secrets"]
  verbs: ["get", "watch", "list"]

RoleBinding

  • Grants the permissions defined in a Role to a user. Holds a list of subjects and a reference to the Role.
  • RoleBinding may reference a Role in the same namespace.
  • RoleBinding can reference a ClusterRole

Example:

apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
# You can specify more than one "subject"
- kind: User
  name: jane # "name" is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  # "roleRef" specifies the binding to a Role / ClusterRole
  kind: Role #this must be Role or ClusterRole
  name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io
  • RoleBinding can also reference ClusterRole, you can define a set of common roles across your cluster, then reuse them in multiple namespaces.   The RoleBinding works like a filter, it turns the cluster-wide permission into a specific one for a user within a namespace.

Example:

apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "dave" to read secrets in the "development" namespace.
# You need to already have a ClusterRole named "secret-reader".
kind: RoleBinding
metadata:
  name: read-secrets
  #
  # The namespace of the RoleBinding determines where the permissions are granted.
  # This only grants permissions within the "development" namespace.
  namespace: development
subjects:
- kind: User
  name: dave # Name is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: secret-reader
  apiGroup: rbac.authorization.k8s.io

ClusterRoleBinding

  • Grants permissions access cluster-wide.
  • It can grant permissions to any user in a specific namespace to perform actions in any namespace on the cluster.

Example:

apiVersion: rbac.authorization.k8s.io/v1
# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.
kind: ClusterRoleBinding
metadata:
  name: read-secrets-global
subjects:
- kind: Group
  name: manager # Name is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: secret-reader
  apiGroup: rbac.authorization.k8s.io

After creating a binding you can not change the Role or ClusterRole that it refers to.

kubectl auth reconcile # creates or update a manifest file containing RBAC objects, and handles deleting and recreating binding objects if required to change the role they refer to.

Referring to Resources

RBAC refers to resources using exactly the same name that appears in the URL for the relevant API endpoint.

  • RBAC can refer to resources and subresources.
GET /api/v1/namespaces/{namespace}/pods/{name}/log

To represent this in an RBAC role, use a slash ( / ) to delimit the resource and subresource. Example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 namespace: default
 name: pod-and-pod-logs-reader
rules:
- apiGroups: [""]
 resources: ["pods", "pods/log"] # this is how you refer to a subresource
 verbs: ["get", "list"]

Requests can be restricted to individual instances of a resource using resourceNames list. Example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: configmap-updater
rules:
- apiGroups: [""]
  # restricts its subject to only get or update a ConfigMap named my-configmap
  # at the HTTP level, the name of the resource for accessing ConfigMap objects is "configmaps"
  resources: ["configmaps"]
  resourceNames: ["my-configmap"]
  verbs: ["update", "get"]

RBAC Verbs List

  • get, list (read-only)

  • create, update, patch, delete, deletecollection (read-write)

 

Wrap-up

  • If you want to define a role within a namespace, use a Role
  • If you want to define a role cluster-wide, use a ClusterRole.
  • If you want to attach a Role or ClusterRole to a specific user, use RoleBinding
  • If you want to bind a ClusterRole to all the namespaces in your cluster, you use a ClusterRoleBinding.

In the YAML File

  • Role and ClusterRole creates and bind a role to rules defined on the file.

  • RoleBinding and ClusterRoleBinding binds a role to subjects: groups, users and serviceAccount.

Specs

API Resource API Version Namespaced Kind
roles rbac.authorization.k8s.io/v1 true Role
rolebindings rbac.authorization.k8s.io/v1 true RoleBinding
clusterroles rbac.authorization.k8s.io/v1 false ClusterRole
clusterrolebindings rbac.authorization.k8s.io/v1 false ClusterRoleBinding

Reference:

Using RBAC Authorization

2. Use Kubeadm to install a basic cluster

Kubeadm can be used to deploy a minimum viable K8s cluster.

Steps

  • Install a single control-plane Kubernetes cluster
  • Run kubeadm init on the control node
# the --control-plane-endpoint is used for future cluster HA upgrade
sudo kubeadm init --control-plane-endpoint=cluster-endpoint

The init command output directs the steps of cluster configuration, save the "join" commands for setting up worker nodes.

Setup k8s config file and user permissions

mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install a Pod network on the cluster so that your Pods can talk to each other

# weave net add-on 
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Join worker nodes

kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>

Check if the workers nodes where created

kubectl get nodes

Check cluster configuration

kubectl config view

References:

Creating a cluster with kubeadm.

3. Manage a High Availability Kubernetes cluster

Options for High Availability topology

On the Kubernetes documentation two topologies are mentioned:

  • With stacked control plane nodes, where etcd nodes are colocated with control plane nodes
  • With external etcd nodes, where etcd runs on separate nodes from the control plane

Stacked etcd topology

etcd is stacked on top of the cluster formed by the nodes managed by kubeadm that run control plane components.

Each control plane node runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager. The kube-apiserver is exposed to worker nodes using a load balancer.

External etcd topology

An HA cluster with external etcd is a topology where the distributed data storage cluster provided by etcd is external to the cluster formed by the nodes that run control plane components.

Like the stacked etcd topology, each control plane node in an external etcd topology runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager. And the kube-apiserver is exposed to worker nodes using a load balancer. However, etcd members run on separate hosts, and each etcd host communicates with the kube-apiserver of each control plane node.

This topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.

Steps for Stacked etcd topology

Create a kube-apiserver

References:

Options for Highly Available topology

4. Provision underlying infrastructure to deploy a Kubernetes cluster

Every Kubernetes Cluster solution has it's own specification of the minimum recommended infrastructure for deploying a cluster, for the sake of simplicity, kubeadm minimal requirements are considered.

 

What hardware infrastructure is recommended to deploy a Kubernetes Cluster using kubeadm?

  • Linux Host
  • Network Connectivity between all machines in the cluster
  • Unique hostname, MAC address and product_uuid for every node
  • Required Ports opened:
Protocol Direction Port Range Purpose Used by
TCP Inbound 6443* Kubernetes API server All
TCP Inbound 2379-2380 etcd server client API kube-apiserver, etcd
TCP Inbound 10250 kubelet API Self, Control plane
TCP Inbound 10251 kube-scheduler Self
TCP Inbound 10252 kube-controller-manager Self
  • Swap disabled
  • Recommended Hardware setup:
Role Minimal Recommended Memory Minimal Recommended CPU(cores)
control 2 GB 2
worker 2 GB 2

This infrastructure recommended specifications are based on a simple cluster deployment, adding any other specific solutions may change the required hardware for running a stable cluster application.

Steps to be performed on each machine

  1. Setup minimum recommended hardware configuration(when working with VMs or look-alikes)
  2. Set up ip addresses for each node
  3. Set up specific hostname for the nodes
  4. Check network adapters
  5. Configure iptables
  6. Install Docker
  7. Install kubeadm
  8. Install kubelet
  9. Install kubectl

References:

Install kubeadm

5. Perform a version upgrade on a Kubernetes cluster using Kubeadm

At a high level, the steps you perform are:

  • Upgrade the control plane
  • If there is more than one control plane, upgrade those too
  • Upgrade the worker nodes in your cluster
  • Upgrade clients such as kubectl, kubelet
  • Adjust manifests and other resources based on the API changes that accompany the new Kubernetes version

kubeadm Cluster Upgrade

The kubeadm cluster bootstrap provides a user-friendly command that wraps complex upgrading logic behind one command, with support for both planning an upgrade and actually performing it.

Note: It is not recommended to skip versions while upgrading k8s clusters, so you your cluster is in version 1.18-x, and you need to upgrade to version 1.21.x, you need to perform version upgrades for version 1.19.x and 1.20.x respectivelly before upgrading to version 1.21.x.

Performing the cluster upgrade

Determine which version to upgrade to:

apt update
apt-cache madison kubeadm

You can check differences between versions:

kubeadm upgrade diff

Upgrading control plane nodes

After choosing the version, the first control plane will be the start of all changes in the cluster.

For the first control plane node

Upgrade kubeadm:

# replace x in 1.21.x-00 with the latest patch version
apt-mark unhold kubeadm && \
apt-get update && apt-get install -y kubeadm=1.21.x-00 && \
apt-mark hold kubeadm
-
# since apt-get version 1.1 you can also use the following method
apt-get update && \
apt-get install -y --allow-change-held-packages kubeadm=1.21.x-00

Verify that the download works and has the expected version:

kubeadm version

Verify the upgrade plan:

kubeadm upgrade plan

Choose a version to upgrade to, and run the appropriate command(the plan command outputs the apply command). For example:

# replace x with the patch version you picked for this upgrade
sudo kubeadm upgrade apply v1.21.x

Manually upgrade your CNI provider plugin(if necessary).

For the other control plane nodes

Select the control node to apply the upgrade:

sudo kubeadm upgrade <node-name>

Drain the node, prepare it for maintenance by marking it unschedulable and evicting the workloads:

kubectl drain <node-to-drain> --ignore-daemonsets

Upgrade the kubelet and kubectl:

    # replace x in 1.21.x-00 with the latest patch version
    apt-mark unhold kubelet kubectl && \
    apt-get update && apt-get install -y kubelet=1.21.x-00 kubectl=1.21.x-00 && \
    apt-mark hold kubelet kubectl
    -
    # since apt-get version 1.1 you can also use the following method
    apt-get update && \
    apt-get install -y --allow-change-held-packages kubelet=1.21.x-00 kubectl=1.21.x-00

Restart the kubelet:

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Uncordon the node, bring the node back online by marking it schedulable:

# replace <node-to-drain> with the name of your node
kubectl uncordon <node-to-drain>

Upgrade worker nodes

Upgrade kubeadm:

# replace x in 1.21.x-00 with the latest patch version
apt-mark unhold kubeadm && \
apt-get update && apt-get install -y kubeadm=1.21.x-00 && \
apt-mark hold kubeadm
-
# since apt-get version 1.1 you can also use the following method
apt-get update && \
apt-get install -y --allow-change-held-packages kubeadm=1.21.x-00

Call "kubeadm upgrade", for worker nodes this upgrades the local kubelet configuration:

sudo kubeadm upgrade node

Drain the node, prepare the node for maintenance by marking it unschedulable and evicting the workloads:

# replace <node-to-drain> with the name of your node you are draining
kubectl drain <node-to-drain> --ignore-daemonsets

Upgrade kubelet and kubectl:

# replace x in 1.21.x-00 with the latest patch version
apt-mark unhold kubelet kubectl && \
apt-get update && apt-get install -y kubelet=1.21.x-00 kubectl=1.21.x-00 && \
apt-mark hold kubelet kubectl
-
# since apt-get version 1.1 you can also use the following method
apt-get update && \
apt-get install -y --allow-change-held-packages kubelet=1.21.x-00 kubectl=1.21.x-00

Restart the kubelet:

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Uncordon the node, bring the node back online by marking it schedulable:

# replace <node-to-drain> with the name of your node
kubectl uncordon <node-to-drain>

Verify the status of the cluster, after the kubelet is upgraded on all nodes verify that all nodes are available again.

kubectl get nodes

Recovering from a failure state

To recover from a bad state:

kubeadm upgrade apply --force

During upgrade kubeadm writes the following backup folders under /etc/kubernetes/tmp:

  • kubeadm-backup-etcd-[date]-[time] contains a backup of the local etcd member data for this control plane.

  • kubeadm-backup-manifests-[date]-[time] contains a backup of the static Pod manifest files for this control plane Node:

References:

How to install specific version of Kubernetes

Kubernetes upgrade: The definitive guite to do it yourself

Upgrading kubeadm clusters

6. Implement etcd backup and restore

etcd Backup

  • etcd is a key-value store that is used in the cluster to store all cluster data
  • It should be backed up on a regular basis

To backup the etcd, different ype of information are needed:

  1. The etcdctl command, which may not be installed by default
  2. The endpoint that is used to connect to the etcd
  3. cacert, which points to the certificate of the etcd service
  4. key, which points to the etcd service key

The endpoint is the etcd port.

ETCDCTL_API=3 etcdctl snapshot save -h

sudo ETCDCTL_API=3 etcdctl snapshot save snapshot.db --endpoints=http://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt

ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 endpoint status -w table
ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshotdb

etcd Restore

Make sure a snapshotfile is present. It can either be a snapshot file from a previous backup operation, or from a remaining data directory.

ETCDCTL_API=3 etcdctl --endpoints 10.2.0.9:2379 snapshot restore snapshotdb

If any API servers are running in your cluster, you should not attempt to restore instances of etcd. Instead, follow these steps to restore etcd:

  • stop all API server instances
  • estore state in all etcd instances
  • restart all API server instances

References:

Operating etcd clusters for Kubernetes