Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-apiserver occasionally terminates on cluster boot and doesn't get restarted #7241

Closed
hoo29 opened this issue Nov 7, 2024 · 4 comments
Closed

Comments

@hoo29
Copy link

hoo29 commented Nov 7, 2024

Environmental Info:
RKE2 Version:
rke2 version v1.31.1+rke2r1 (909d20d)
go version go1.22.6 X:boringcrypto

Node(s) CPU architecture, OS, and Version:
Rocky 9
Linux 5.14.0-427.24.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 8 17:47:19 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
Running in AWS with 3 servers and 7 agents. Using cilium. Servers are 2 core 4 GB ram.

Describe the bug:
Our cluster is shutdown overnight for cost saving and started back up each morning. Twice in the last month a server node has failed to start properly and this has prevented some agents nodes from joining, despite the other 2 servers being marked as healthy. We were previously on v1.29.3+rke2r1 and never had this issue after running it for several months with the same shutdown behaviour.

Looking through the broken server (server0), kube-apiserver gets terminated during rke2-server startup which causes numerous other issues. ps aux | grep kube-apiserver returns nothing on server0.

Steps To Reproduce:

Expected behavior:
The cluster starts up successfully.

The cluster is not severely degraded from 1 out 3 server nodes having issues.

Actual behavior:
The cluster does not always start up successfully.

The cluster is severely degraded from 1 out 3 server nodes having issues.

Additional context / logs:
etcd appears healthy. On server0 with 2456 being the pid of etcd

$ nsenter --target 2456 --mount  --net --pid --uts -- etcdctl \
  --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
  --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key \
  --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt endpoint status  --cluster --write-out=table;

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.128.23.141:2379 | 12659cee018f1be2 |  3.5.13 |   52 MB |      true |      false |       248 |   71361783 |           71361783 |        |
|  https://10.128.23.66:2379 | 857d2c9e2293e976 |  3.5.13 |   52 MB |     false |      false |       248 |   71361783 |           71361783 |        |
| https://10.128.21.171:2379 | f699167f1021d729 |  3.5.13 |   52 MB |     false |      false |       248 |   71361783 |           71361783 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

kube-apiserver on server1 & server2 are filled with authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, service account token has been invalidated].

Cilium is failing to start on server0 but with kube-apiserver down, I assume nothing will.

I have found this comment around load balancer health checks #5557 (comment) which we are not doing (but will implement) but I don't think it is the issue as etcd appears healthily.

Looking at kube-apiserver on server0

/var/log/containers/kube-apiserver-k8s-server0.DOMAIN_kube-system_kube-apiserver-a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171.log -> /var/log/pods/kube-system_kube-apiserver-k8s-server0.DOMAIN_80bcfddfbe1c3d0b4d90fe8d28b7a51c/kube-apiserver/22.log (attached as k8s-server0-not-ready-kube-apiserver-pod.log)

The kube-apiserver on server0 is container a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171. In /var/log/messages I can see

Nov  7 06:02:02 k8s-server0 systemd[1]: Started libcontainer container a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171.
Nov  7 06:02:26 k8s-server0 systemd[1]: run-containerd-runc-k8s.io-a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171-runc.choiNb.mount: Deactivated successfully.
Nov  7 06:02:31 k8s-server0 systemd[1]: run-containerd-runc-k8s.io-a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171-runc.BKojHd.mount: Deactivated successfully.
Nov  7 06:02:42 k8s-server0 systemd[1]: cri-containerd-a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171.scope: Deactivated successfully.
Nov  7 06:02:42 k8s-server0 systemd[1]: cri-containerd-a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171.scope: Consumed 9.180s CPU time.

Weirdly, critctl reports it as running on server0

$ crictl ps --name  kube-apiserver
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD
a4d23bea00f67       1b44d478ec444       9 hours ago         Running             kube-apiserver      22                  093e8959f7b4d       kube-apiserver-k8s-server0.DOMAIN

$ crictl inspect a4d23bea00f67 | jq .info.pid
2372

$ ps aux | grep 2372
root      200950  0.0  0.0   6408  2176 pts/0    S+   15:22   0:00 grep --color=auto 2372

Containerd logs on server0 say it has been killed but then something is trying to exec it

time="2024-11-07T06:02:41.609408926Z" level=info msg="Kill container \"a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171\""
time="2024-11-07T06:02:41.987115383Z" level=error msg="ExecSync for \"a4d23bea00f6796f0ae87e8779331cd3abb4557a6208ad6ba52666e2ae544171\" failed" error="failed to exec in container: failed to start exec \"36daae2d0ca0aa2b27f83e18036018231cccee0bdfb824c5d11e9d9ed4020f3c\": OCI runtime exec failed: exec failed: cannot exec in a stopped container: unknown"

I cannot stop the zombie container

 crictl stop a4d23bea00f67
E1107 16:15:02.506585  219370 remote_runtime.go:366] "StopContainer from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="a4d23bea00f67"
FATA[0002] stopping the container "a4d23bea00f67": rpc error: code = DeadlineExceeded desc = context deadline exceeded

Running systemctl restart rke2-server on server0 fixes it and all nodes immediately join but we would like to understand why this is happening.

Log attachments
k8s-agent0-not-ready-rke2-agent.log
k8s-server0-not-ready-containerd.log
k8s-server0-not-ready-rke2-server.log
k8s-server1-rke2-server.log
k8s-server2-rke2-server.log
k8s-agent6-rke2-agent.log
k8s-server0-not-ready-kube-apiserver-pod.log

@brandond
Copy link
Member

brandond commented Nov 7, 2024

OCI runtime exec failed: exec failed: cannot exec in a stopped container: unknown

This sounds like a containerd bug:

We've updated to containerd v1.7.22 in RKE2 v1.31.2+rke2r1, please try that release.

@hoo29
Copy link
Author

hoo29 commented Nov 7, 2024

Thanks for the quick response. We will give that a go.

Do you know why the agents are failing to join when one control node is down, is that the wrong health checks on the load balancer issue?

@brandond
Copy link
Member

brandond commented Nov 7, 2024

One problem per issue please!

@hoo29
Copy link
Author

hoo29 commented Nov 7, 2024

Thanks - I will close this and reopen if we still experience the same with v1.31.2+rke2r1

@hoo29 hoo29 closed this as completed Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants