Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Cluster when one of the cluster node is down #5327

Closed
nagraj321 opened this issue Jan 27, 2024 · 1 comment
Closed

Issue with Cluster when one of the cluster node is down #5327

nagraj321 opened this issue Jan 27, 2024 · 1 comment

Comments

@nagraj321
Copy link

Environmental Info:
RKE2 Version:
rke2 -v
rke2 version v1.24.10+rke2r1 (1ccdce2)
go version go1.19.5 X:boringcrypto

Node(s) CPU architecture, OS, and Version:

Linux testserver1 6.1.67 #1 SMP PREEMPT_DYNAMIC Tue Dec 19 11:25:42 PST 2023 x86_64 GNU/Linux

Cluster Configuration:

2 Servers and No Agents

Describe the bug:

We have followed the steps mentioned in the https://docs.rke2.io/install/ha. We see that cluster is working fine.
/var/lib/rancher/rke2/bin/kubectl get nodes --kubeconfig /etc/rancher/rke2/rke2.yaml
NAME STATUS ROLES AGE VERSION
testserver1 Ready control-plane,etcd,master 44m v1.24.10+rke2r1
testserver2 Ready control-plane,etcd,master 43m v1.24.10+rke2r1

When the testserver2 goes down we see that cluster is not working

/var/lib/rancher/rke2/bin/kubectl get nodes --kubeconfig /etc/rancher/rke2/rke2.yaml
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)

Steps To Reproduce:

  • Installed RKE2:

Expected behavior:

Our assumption is that if one of the cluster member goes down still it needs to work

Actual behavior:

If one of the cluster node goes down we cann't access the cluster.

Additional context / logs:

@brandond
Copy link
Member

brandond commented Jan 27, 2024

A two node etcd cluster has zero fault tolerance. This is why the rke2 HA docs note that you need at least 3 nodes.

See also https://etcd.io/docs/v3.5/faq/#what-is-failure-tolerance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants