- Kubernetes Infra Offload Usage Guide
- For Infra Agent and Infra Manager to work, Kernel 5.4 or greater is required.
- The recommended OS to be used is Fedora 33
- The recommended Kubernetes version to be used is 1.25.0
- TAP interfaces should be created and available on the host system before Infra Agent is deployed. The default prefix for TAP interfaces names, as required by InfraAgent is "P4TAP_".
- The number of TAP interfaces created must be a power of 2 and more than 4. For example, it can be 8, 16, 32 and so on.
- The P4 data plane program (k8s_dp.p4) and the configuration file (k8s_dp.conf), must not be modified as the k8s control plane software is tightly coupled with the pipeline. The P4 compiler generated artifacts are available in the container and must be used as is.
- The firewall, if enabled in host OS, should either be disabled or configured to allow required traffic to flow through.
If Kubernetes and other required components are already installed on the machine, then proceed to IPDK OVS Install and Run
Kubernetes is known to not work well with Linux swap and hence, it should be turned off.
# swapoff -a
For Fedora 33, swapoff doesn't completely turnoff the swap after reboot. Remove the below package.
# dnf remove zram-generator-defaults
Check if the swap is off
# swapon --show
There should be no zram device
# lsblk
Also remove any swap specific entries from /etc/fstab.
Load below kernel modules and add them to modules-load for autoload during the reboot
# modprobe overlay
# modprobe br_netfilter
# cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
Enable IPvX forwarding
# sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
# sysctl -p
set SELinux in permissive mode and verify
# sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# getenforce
Install docker, configure associated settings and start it. Docker is required when using older versions of kubernetes (< v1.23) as dockerd talks to containerd to pull K8s images. Docker is not required if cri runtime is containerd.
# dnf update -y
# dnf install -y dnf-plugins-core
# dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
# dnf install docker-ce docker-ce-cli containerd.io
Create or edit /etc/systemd/system/docker.service.d/http-proxy.conf with proxies, if your setup is behind a proxy. It should have contents as below
[Service]
Environment="HTTPS_PROXY=<proxy-url>"
Environment="HTTP_PROXY=<proxy-url>"
Environment="NO_PROXY=localhost,127.0.0.1,::1"
Set storage driver as overlay2 and use cgroupdriver.
# mkdir -p /etc/docker
Create /edit /etc/docker/daemon.json to have contents as below. The registry-mirror setting in this file is needed when user is limited by docker pull cap limits on free user accounts.
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "50m"
},
"data-root": "/mnt/docker-data",
"storage-driver": "overlay2",
"registry-mirrors": ["https://mirror.gcr.io"]
}
Create ~/.docker/config.json with following contents:
{
"proxies": {
"default": {
"httpProxy": "<proxy-url>",
"httpsProxy": "<proxy-url>",
"noProxy": "localhost,127.0.0.1"
}
}
}
Start the docker daemon
# systemctl start docker
Create a local registry and verify that it is running. Note that, this requires docker login credentials to setup authentication token on local node.
# docker login
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
# docker run -d -p 5000:5000 --restart=always --name registry registry:2
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
99d9b2ede2ea registry:2 "/entrypoint.sh /etc…" 36 seconds ago Up 35 seconds 0.0.0.0:5000->5000/tcp registry
Create /etc/crictl.yaml with following contents.
# cat /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: true
Enable containerd services, configure default settings and proxies and start the containerd services.
# systemctl enable containerd.service
# mkdir -p /etc/containerd
# containerd config default | tee /etc/containerd/config.toml
In the /etc/containerd/config.toml file, set SystemCgroup to true under [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]. The file would look as below.
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
...
Create proxy file and configure proxy settings viz. http_proxy, https_proxy and no_proxy for containerd. Include host IP address, pod subnet and service subnet to the no_proxy setting.
mkdir -p /usr/lib/systemd/system/containerd.service.d
The /usr/lib/systemd/system/containerd.service.d/proxy.conf should look as below. In this example, pod network is 10.244.0.0/16, service network is 10.96.0.0/16 and API server, local API endpoint, control plane endpoint is 192.168.110.5
# cat /usr/lib/systemd/system/containerd.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=<proxy-url>"
Environment="HTTPS_PROXY=<proxy-url>"
Environment="NO_PROXY=localhost,127.0.0.1,::1,10.244.0.0/16,10.96.0.0/16,192.168.0.0/16,<node-ip>"
Confiugre following environment variables for proxy settings
export no_proxy=127.0.0.1,localhost,192.168.0.0/16,<pod-cidr>,<service-cidr>,<host-ip>
export https_proxy=<proxy-url>
export http_proxy=<proxy-url>
Check the status and it should show it running as below
# systemctl status containerd.service
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/containerd.service.d
└─proxy.conf
Active: active (running) since Thu 2022-07-14 13:29:25 IST; 9min ago
Docs: https://containerd.io
Process: 100768 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 100769 (containerd)
Tasks: 37
Memory: 19.8M
CPU: 663ms
CGroup: /system.slice/containerd.service
└─100769 /usr/bin/containerd
<...> level=info msg="Start subscribing containerd event"
<...> level=info msg="Start recovering state"
<...> level=info msg=serving... address=/run/containerd/container>
<...> level=info msg=serving... address=/run/containerd/container>
<...> level=info msg="containerd successfully booted in 0.039112s"
<...> systemd[1]: Started containerd container runtime.
<...> level=info msg="Start event monitor"
<...> level=info msg="Start snapshots syncer"
<...> level=info msg="Start cni network conf syncer"
<...> level=info msg="Start streaming server"
Setup Kubernetes repo manager
# cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
# sudo dnf install -y kubelet kubeadm kubectl containernetworking-plugins cri-tools --disableexcludes=kubernetes
# dnf list installed | grep kube
cri-tools.x86_64 1.24.2-0 @kubernetes
kubeadm.x86_64 1.25.0-0 @kubernetes
kubectl.x86_64 1.25.0-0 @kubernetes
kubelet.x86_64 1.25.0-0 @kubernetes
Pull kubernetes images and verify the downloaded images
# kubeadm config images pull
# crictl images ls
IMAGE TAG IMAGE ID SIZE
docker.io/calico/cni v3.24.1 67fd9ab484510 87.4MB
docker.io/calico/kube-controllers v3.24.1 f9c3c1813269c 31.1MB
docker.io/calico/node v3.24.1 75392e3500e36 80.2MB
localhost:5000/infraagent latest 0c33a598fd923 32.2MB
localhost:5000/inframanager latest 56f21c0cc91a2 11MB
quay.io/tigera/operator v1.28.1 52468087127eb 18.8MB
registry.k8s.io/coredns/coredns v1.9.3 5185b96f0becf 14.8MB
registry.k8s.io/etcd 3.5.4-0 a8a176a5d5d69 102MB
registry.k8s.io/kube-apiserver v1.25.0 4d2edfd10d3e3 34.2MB
registry.k8s.io/kube-controller-manager v1.25.0 1a54c86c03a67 31.3MB
registry.k8s.io/kube-proxy v1.25.0 58a9a0c6d96f2 20.3MB
registry.k8s.io/kube-scheduler v1.25.0 bef2cf3115095 15.8MB
registry.k8s.io/pause 3.8 4873874c08efc 311kB
# kubeadm config images list
registry.k8s.io/kube-apiserver:v1.25.0
registry.k8s.io/kube-controller-manager:v1.25.0
registry.k8s.io/kube-scheduler:v1.25.0
registry.k8s.io/kube-proxy:v1.25.0
registry.k8s.io/pause:3.8
registry.k8s.io/etcd:3.5.4-0
registry.k8s.io/coredns/coredns:v1.9.3
Enable the kubelet services
# systemctl enable kubelet.service
This kubernetes depends upon the IPDK P4-OVS to be running in the background. Once P4-OVS is running, kubernetes can load its P4 pipeline and offload various functionalities on it i.e. on P4 data plane. Note that, IPDK P4-OVS needs to installed and run on the host natively. To install P4-OVS and P4-SDE (components as per IPDK 22.07 release) individually, follow the instructions listed below. Note that, P4C is not required as this software includes P4C generated artifacts.
To install P4-SDE, follow instructions at https://github.com/p4lang/p4-dpdk-target. Make sure to checkout using tag v22.07 (for 22.07 release) and build for TDI. User can also refer to the script https://github.com/ipdk-io/ipdk/blob/main/build/networking/scripts/build_p4sde.sh. The main steps can be summerized as:
git clone https://github.com/p4lang/p4-dpdk-target
cd p4-dpdk-target
git checkout v22.07
git submodule update --init --recursive
mkdir ./install
pushd ./tools/setup
source p4sde_env_setup.sh <path to p4-dpdk-target>
popd
./autogen.sh
./configure --prefix=$SDE_INSTALL --with-generic-flags=yes
make -j
make install
To install P4-OVS, follow instructions as per the script https://github.com/ipdk-io/ipdk/blob/main/build/networking/scripts/get_p4ovs_repo.sh to download the code. Note that, the P4-OVS code is checked out using tag v22.07 (for 22.07 release). In addition, set the OVS_INSTALL env variable to point to this P4-OVS base directory. The main steps can be summarized as:
git clone https://github.com/ipdk-io/ovs.git -b ovs-with-p4 P4-OVS
cd P4-OVS
git checkout v22.07
git submodule update --init --recursive
./install_dep_packages.sh $PWD
source p4ovs_env_setup.sh $SDE_INSTALL
./build-p4ovs.sh $SDE_INSTALL
export OVS_INSTALL=$PWD
Install Go package (go version go1.18.6 linux/amd64), following instructions at https://go.dev/doc/install
Pull P4-K8s and other software
git clone https://github.com/ipdk-io/k8s-infra-offload.git p4-k8s
cd p4-k8s
go get -u google.golang.org/protobuf
Build p4-k8s images and other binaries as below
# make build
# make docker-build
Push InfraManager and InfraAgent images into docker private repo either manually or through make command.
Either
# make docker-push
Or
# docker push localhost:5000/infraagent:latest
# docker push localhost:5000/inframanager:latest
The docker images should now be listed in the local repository as below.
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost:5000/inframanager latest 7605ed47e042 5 minutes ago 22.1MB
<none> <none> 485d7bc6ec38 5 minutes ago 1.38GB
localhost:5000/infraagent latest 500075b89922 6 minutes ago 68.7MB
<none> <none> dc519d06de56 6 minutes ago 1.68GB
golang 1.18 f81bafb819d5 6 days ago 965MB
...
Pull images for use by Kubernetes CRI
# crictl pull localhost:5000/inframanager:latest
# crictl pull localhost:5000/infraagent:latest
Run create_interfaces.sh script which, in addition to creating specified number of TAP interfaces, sets up the huge pages and starts P4-OVS.
# ./p4-k8s/scripts/create_interfaces.sh <8/16/32/...> [OVS_DEP_INSTALL_PATH]
After running the above script, verify that P4-OVS is running in the background.
# ps -ef | grep ovs
root 25050 1 0 07:15 ? 00:00:00 ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
root 25054 1 99 07:15 ? 00:09:03 ovs-vswitchd --pidfile --detach --no-chdir --mlockall --log-file=/tmp/logs/ovs-vswitchd.log
Rename first TAP interface and run ARP-Proxy on it.
ip link set P4TAP_0 name TAP_0
ip link set TAP_0 up
ip addr add 169.254.1.1/32 dev TAP_0
ARP_PROXY_IF=TAP_0 ./bin/arp_proxy &
Start the containerd services
# systemctl start containerd.service
Initialize and start the core k8s components as below
# kubeadm init --pod-network-cidr=<pod-cidr> --service-cidr=<service-cidr>
Once K8s control plane initialization is complete successfully, then (as non-root user)
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config
Or (as root user)
# export KUBECONFIG=/etc/kubernetes/admin.conf
Start the deployments Note that, for single node deployment, the node must be untainted to allow worker pods to share the node with control plane. This can be done as below.
# kubectl taint node <node-name> node-role.kubernetes.io/control-plane-
# make deploy
# make deploy-calico
Check deployment using the command below.
# kubectl get pods -A -o wide
To run a simple ping test from one pod to another, create two test pods. Note that, the yaml file in the package is to create a single test pod and so, copy and modify it to create pod with different name. For example, copy it as test-pod2.yaml and change the metadata name and container name to be test-pod2. The .yaml file for test-pod2 should look as below.
apiVersion: v1
kind: Pod
metadata:
name: test-pod2
spec:
containers:
- name: test-pod2
image: quay.io/quay/busybox:latest
...
Now, create both the test pods
# kubectl create -f example/test_pod.yaml
# kubectl create -f example/test_pod2.yaml
Check that the two test pods are ready and running.
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-pod 1/1 Running 0 10m 10.244.0.6 ins21 <none> <none>
test-pod2 1/1 Running 0 9m33s 10.244.0.7 ins21 <none> <none>
Get the IP address assigned to one of the pods using ifconfig. Then, ping that address from the other pod.
# kubectl exec test-pod2 -- ifconfig eth0
# kubectl exec test-pod -- ping 10.244.0.6
PING 10.244.0.6 (10.244.0.6): 56 data bytes
64 bytes from 10.244.0.6: seq=0 ttl=64 time=0.112 ms
64 bytes from 10.244.0.6: seq=1 ttl=64 time=0.098 ms
64 bytes from 10.244.0.6: seq=2 ttl=64 time=0.102 ms
64 bytes from 10.244.0.6: seq=3 ttl=64 time=0.112 ms
...
To test simple service deployment, user can use iperf based server available in https://github.com/Pharb/kubernetes-iperf3.git repository. Clone this repository and deploy the service as below:
# git clone https://github.com/Pharb/kubernetes-iperf3.git
# cd kubernetes-iperf3
# ./steps/setup.sh
# kubectl get svc -A -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
iperf3-server ClusterIP 10.111.123.3 <none> 5201/TCP 6m56s app=iperf3-server
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 15m <none>
# kubectl get ep -A -o wide
NAMESPACE NAME ENDPOINTS AGE
default iperf3-server 10.244.0.5:5201,10.244.0.6:5201 5h22m
default kubernetes 10.233.134.119:6443 5h35m
kube-system kube-dns 10.244.0.3:53,10.244.0.4:53,10.244.0.3:53 + 3 more... 5h35m
# kubectl get pods -A -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
iperf3-clients-8gkv7 1/1 Running 0 18m 10.244.0.9 ins21 <none> <none>
iperf3-server-deployment-59bf4754f9-4hp4c 1/1 Running 0 18m 10.244.0.8 ins21 <none> <none>
test-pod 1/1 Running 0 25m 10.244.0.6 ins21 <none> <none>
test-pod2 1/1 Running 0 25m 10.244.0.7 ins21 <none> <none>
To test service traffic, iperf3 client can be started as below:
# cd kubernetes-iperf3
# ./steps/run.sh
The iperf3 client can also be executed manually inside the iperf client pod
# kubectl exec --stdin --tty <iperf3-clients-xxx> -- /bin/bash
# iperf3 -c iperf3-server
Connecting to host iperf3-server, port 5201
[ 5] local 10.244.0.7 port 37728 connected to 10.96.186.247 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 107 KBytes 880 Kbits/sec 2 1.41 KBytes
[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
Reset kubernetes which would stop and remove all pods. Then, remove all k8s runtime configurations and other files
# make undeploy
# make undeploy-calico
# kubeadm reset -f
# rm -rf /etc/cni /etc/kubernetes
# rm -rf /var/lib/etcd /var/lib/kubelet /var/lib/cni
# rm -rf /var/run/kubernetes
# rm -rf $HOME/.kube
Check if there are pods that are still running. If so, stop and remove them
# crictl ps -a
# crictl pods ls -a
# crictl stopp <pod_id>
# crictl rmp <pod_id>
Stop the system services
# systemctl stop kubelet
# systemctl stop containerd
Stop the ARP proxy and OVS processes running in the background. This will also remove all the TAP interfaces that were configured earlier including the renamed one.
# pkill arp_proxy
# pkill ovsdb-server
# pkill ovs-vswitchd