Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5804

Closed
brandond opened this issue Apr 18, 2024 · 1 comment
Assignees

Comments

@brandond
Copy link
Member

brandond commented Apr 18, 2024

See original issue and k3s-io/k3s#9661 for details

The fact that the Failed to get etcd MemberList for 4.246.140.77:59850 error is printed on this node suggests that it is attempting to get the member list from ITSELF, instead of from an existing cluster member. I see that this node is configured to join using https://sfdev5277747-cluster.infra-sf-ea.infra.uipath-dev.com:9345 as the server address. Is this perhaps an external load-balancer that includes this server in the backend pool? If you're using an external load-balancer as the fixed registration endpoint, you MUST ensure that the load-balancer does not send requests to pool members until the member is healthy. Otherwise you'll end up with cases like this, where it is trying to join itself, and gets stuck.

Originally posted by @brandond in #5557 (comment)

@mdrahman-suse
Copy link
Contributor

Validation on master with commit 95e13dc

Followed the steps mentioned here: #5806 (comment) Details are mentioned in that comment

Replication

$ rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto

$ sudo journalctl -u rke2-server | grep "Failed to get etcd"
Apr 18 22:00:19  rke2[68659]: time="2024-04-18T22:00:19Z" level=warning msg="Failed to get etcd MemberList for 3.138.85.155:32864: context deadline exceeded"
  • Server2 unable to join the cluster

Validation

$ rke2 -v
rke2 version v1.29.3+dev.95e13dc6 (95e13dc62fdbda33de2c709f1149b0c361d920b9)
go version go1.21.8 X:boringcrypto

$ sudo journalctl -u rke2-server | grep "Failed to get etcd"
$
  • Server2 joined the cluster successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants