You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fact that the Failed to get etcd MemberList for 4.246.140.77:59850 error is printed on this node suggests that it is attempting to get the member list from ITSELF, instead of from an existing cluster member. I see that this node is configured to join using https://sfdev5277747-cluster.infra-sf-ea.infra.uipath-dev.com:9345 as the server address. Is this perhaps an external load-balancer that includes this server in the backend pool? If you're using an external load-balancer as the fixed registration endpoint, you MUST ensure that the load-balancer does not send requests to pool members until the member is healthy. Otherwise you'll end up with cases like this, where it is trying to join itself, and gets stuck.
Followed the steps mentioned here: #5806 (comment) Details are mentioned in that comment
Replication
$ rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto
$ sudo journalctl -u rke2-server | grep "Failed to get etcd"
Apr 18 22:00:19 rke2[68659]: time="2024-04-18T22:00:19Z" level=warning msg="Failed to get etcd MemberList for 3.138.85.155:32864: context deadline exceeded"
Server2 unable to join the cluster
Validation
$ rke2 -v
rke2 version v1.29.3+dev.95e13dc6 (95e13dc62fdbda33de2c709f1149b0c361d920b9)
go version go1.21.8 X:boringcrypto
$ sudo journalctl -u rke2-server | grep "Failed to get etcd"
$
See original issue and k3s-io/k3s#9661 for details
The fact that the
Failed to get etcd MemberList for 4.246.140.77:59850
error is printed on this node suggests that it is attempting to get the member list from ITSELF, instead of from an existing cluster member. I see that this node is configured to join usinghttps://sfdev5277747-cluster.infra-sf-ea.infra.uipath-dev.com:9345
as the server address. Is this perhaps an external load-balancer that includes this server in the backend pool? If you're using an external load-balancer as the fixed registration endpoint, you MUST ensure that the load-balancer does not send requests to pool members until the member is healthy. Otherwise you'll end up with cases like this, where it is trying to join itself, and gets stuck.Originally posted by @brandond in #5557 (comment)
The text was updated successfully, but these errors were encountered: