Replies: 1 comment 2 replies
-
I'm not really following your setup here. I will note that the apiservers do not communicate directly between each other; everything is handled through leases in etcd. The question you should be trying to answer is whether or not your etcd cluster still has quorum in any given scenario. Are 3 out of 5 of your server nodes able to reach each other? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a 5 node cluster each running in server mode. Each node has 4 ethernet ports, and connected to two leaf switches via two interfaces. So nodes can communicate to other nodes via one of the leaf switches. Now if we disconnect two cables on two nodes such that they are connected to different leaf switch we basically disconnect two nodes from each other. Say on node-1, we take out two cables via leaf-1 leaving node-1 only connected to leaf-2, similarly on node-5 we take out cables via leaf-2 leaving node-5 only connected to leaf-1. Now remaining nodes are not configured to act as a router for node-1 to node-5, node-1 and node-5 are disconnected. So for node-1 the cluster is running with 4 nodes, node 1 to 4, and for node-5 cluster has 4 nodes, node 2 to 5. For node 2 to 4 all nodes are connected and reachable. Now in this situation it is possible for some of the lease heartbeat missing if api server is on node-1 and kubelet on node-5 and vice versa. I don't have any external load balancer and kubernetes api service is DNATed to nodes k3s server with random probability. Is this condition kubernetes should be handle on its own ? Occasionally I end up with unhealthy cluster as reported in etcdctl endpoint health. And some of the deployment with replica count 1, they get scheduled on other nodes while the pod running on cabled out node is still active and doesn't receive SIGTERM/SIGKILL.
Beta Was this translation helpful? Give feedback.
All reactions