-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rke2-server creates improper etcd member name #5482
Comments
Please don't paste giant log chunks inline. If you're going to send more than a small handful of lines, attach a file to your comment. |
Can you provide the output of Can you also show the output of If you have any clusters that have not yet been upgraded, I would be curious to see the same info on similarly configured but not yet upgraded nodes. Do your nodes have the correct name within the etcd cluster prior to upgrading? |
The attached file was generated on the rke2 cluster, which has been created with v1.26 directly: |
..and this rke2 cluster has once been created with v1.24 and meanwhile upgraded to v1.26. Yet, no nodes has been replaced: |
OK. so just to be clear it does not rename the existing nodes, but when you join new nodes to the cluster, they have the wrong name? Are you upgrading your clusters by adding new nodes on 1.26, waiting for them to finish joining, and then deleting the 1.24 nodes? Note that we don't support skipping minor versions when upgrading, you should be going 1.24 -> 1.25 -> 1.26. Reference: |
Yes. In detail:
The update strategy is:
Thanks for this hint! Indeed with the rke2 upgrade I skipped 1.25. |
In addition to the etcd/name file I requested up above, can you also grab the |
Ah sorry, forgot that one. Here comes the requested info from ds-cen-kma004:
|
It looks like for some reason rancher is running a snapshot list command at the same time it installs and starts rke2. The snapshot list command races with the main
I'm not sure why rancher is trying to list snapshots before rke2 is even installed and started, that doesn't seem right... but we should fix the issue that is causing the empty name file to be created and used by RKE2. |
Thanks for the report! This should be fixed for the March releases. It won't change the name on existing nodes, but it will set the node name properly on new nodes, and handle the weird name on nodes that are missing the hostname. |
This is difficult to reproduce especially standalone even while spamming etcd commands during node startup. Even removing the hostname local variable from the cloud environments didn't surface this race condition. Going to try to reproduce with Rancher provisioning. |
##Environment Details I tried reproducing this on k3s additionally but wasn't able to do so. RANCHER_VERSIONS
Reproduced using VERSION=v1.25.16+rke2r1 deployed from latest rancher 2.8-head Validated using same rancher instance with new KDM metadata config technicality for the main branch for v1.29 because it's an unsupported version but I've elected to configure as if it's v1.28 with cilium CNI but edit as YAML to deploy an unsupported version for testing purposes. Infrastructure
Node(s) CPU architecture, OS, and version: Linux 5.4.0-1041-aws x86_64 GNU/Linux Cluster Configuration: $ kgn -o wide
Config.yaml: Pretty default with Cilium CNI
Steps
Results:
$ get_etcd
$ rke2 -v
|
Environmental Info:
RKE2 Version:
rke2 version v1.26.13+rke2r1
go version go1.20.13 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux kma004.hiddendomain.tld 5.14.0-362.18.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jan 3 15:54:45 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
Describe the bug:
After creating the new rke2 cluster, the etcd member names consist only of the suffix beginning with -
Steps To Reproduce:
Expected behavior:
Actual behavior:
Additional context / logs:
---- rke2-server Log start ----
Please find rke2-server Logfile in comment below.
---- rke2-server Log end ----
The text was updated successfully, but these errors were encountered: