Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5804

brandond · 2024-04-18T20:06:19Z

See original issue and k3s-io/k3s#9661 for details

The fact that the Failed to get etcd MemberList for 4.246.140.77:59850 error is printed on this node suggests that it is attempting to get the member list from ITSELF, instead of from an existing cluster member. I see that this node is configured to join using https://sfdev5277747-cluster.infra-sf-ea.infra.uipath-dev.com:9345 as the server address. Is this perhaps an external load-balancer that includes this server in the backend pool? If you're using an external load-balancer as the fixed registration endpoint, you MUST ensure that the load-balancer does not send requests to pool members until the member is healthy. Otherwise you'll end up with cases like this, where it is trying to join itself, and gets stuck.

Originally posted by @brandond in #5557 (comment)

The text was updated successfully, but these errors were encountered:

mdrahman-suse · 2024-04-18T22:11:03Z

Validation on master with commit `95e13dc`

Followed the steps mentioned here: #5806 (comment) Details are mentioned in that comment

Replication

$ rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto

$ sudo journalctl -u rke2-server | grep "Failed to get etcd"
Apr 18 22:00:19  rke2[68659]: time="2024-04-18T22:00:19Z" level=warning msg="Failed to get etcd MemberList for 3.138.85.155:32864: context deadline exceeded"

Server2 unable to join the cluster

Validation

$ rke2 -v
rke2 version v1.29.3+dev.95e13dc6 (95e13dc62fdbda33de2c709f1149b0c361d920b9)
go version go1.21.8 X:boringcrypto

$ sudo journalctl -u rke2-server | grep "Failed to get etcd"
$

Server2 joined the cluster successfully

This was referenced Apr 18, 2024

[Release-1.28] - Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5805

Closed

[Release-1.27] - Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5806

Closed

brandond added this to the v1.29.4+rke2r1 milestone Apr 18, 2024

brandond self-assigned this Apr 18, 2024

brandond mentioned this issue Apr 18, 2024

Bump K3s version for 2024-04 release cycle #5714

Merged

brandond assigned mdrahman-suse Apr 18, 2024

mdrahman-suse closed this as completed Apr 18, 2024

mdrahman-suse mentioned this issue Apr 18, 2024

Unrecoverable error when joining node attempts to retrieve etcd member list from itself k3s-io/k3s#9661

Closed

cloudcafetech mentioned this issue Sep 26, 2024

Failed to get MemberList from server #6872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5804

Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5804

brandond commented Apr 18, 2024 •

edited

Loading

mdrahman-suse commented Apr 18, 2024

Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5804

Unrecoverable error when joining node attempts to retrieve etcd member list from itself #5804

Comments

brandond commented Apr 18, 2024 • edited Loading

mdrahman-suse commented Apr 18, 2024

Validation on master with commit 95e13dc

Replication

Validation

brandond commented Apr 18, 2024 •

edited

Loading

Validation on master with commit `95e13dc`