Setting interface down causes timeouts while communicating to localhost endpoints #9878

Dadbod88 · 2024-04-08T10:05:05Z

Dadbod88
Apr 8, 2024

I have a 5 node HA cluster running k3s 1.27.4 as mentioned in #9854, while observing the cluster behavior by taking down network interfaces, I notice the following,
When I take down one interface down on a node, on that node's k3s log there are several timeouts as seen below
Apr 08 09:27:41 node-3 k3s[541236]: E0408 09:27:41.392816 541236 leaderelection.go:327] error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
There are timeout messages while updating node status to 127.0.0.1:6443 as well.
Since this is a localhost communication why is it affected by changes in network ? Is setting an interface down causing the server thread getting stuck and not able to handle the request ?
In my setup 5 nodes are connected to each other via two leaf switches and each node has 4 interfaces and two of connected to a single leaf switch. So when I take down the second interface connected to same leaf switch as the first interface, I occasionally get "NodeNotReady" events and sometimes k3s also gets restarted.

brandond · 2024-04-09T19:03:24Z

brandond
Apr 9, 2024
Collaborator

I can't say that we do a lot of random failure testing, such as taking down interfaces to see what happens. I don't feel like this is something that the node is expected to be resilient to.

I will say that much of Kubernetes networking, in particular service networking managed by kube-proxy, relies on there being an interface with a default route, in order to get traffic to service ClusterIPs routed properly - even if it will stay within the cluster or node.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting interface down causes timeouts while communicating to localhost endpoints #9878

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Setting interface down causes timeouts while communicating to localhost endpoints #9878

Dadbod88 Apr 8, 2024

Replies: 1 comment

brandond Apr 9, 2024 Collaborator

Dadbod88
Apr 8, 2024

brandond
Apr 9, 2024
Collaborator