-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After a reboot, the agent node cannot rejoin rke2 cluster unless previous boot's agent secret is removed #7154
Comments
No, a deterministically generated "hash" would not be a very good password. The behavior you are observing here is specifically covered in the documentation: https://docs.rke2.io/advanced
To resolve this, you will need to do one of the following:
|
@brandond - my understanding is that the /etc/rancher/ directory is maintained across reboots. /etc and /var are allowed state iirc, which is part of my confusion - I could be wrong - can update my script to indicate if it is finding a node password prior to starting the system. it seems that in anyways- I can work around by pre-generating the contents of this file in a uniform way - I'm having a bit of trouble understanding how this would impact token rotation. Thank you for your input and for answering my bug. |
Ah i see my understanding of /etc is incorrect actually. I need to take into account ostree semantics. Thank you. |
Environmental Info:
RKE2 Version:
rke2 version v1.30.5+rke2r1 (0c83bc8)
go version go1.22.6 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Fedora CoreOS 40.20241006.3.0
Linux farmbot93.yyy.zzz 6.10.12-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 30 21:38:25 UTC 2024 x86_64 GNU/Linux
Cluster Configuration:
3 servers, 4 agents
Describe the bug:
On restart of an agent node - the agent cannot join the cluster with the secret stored on /etc/rancher/node/password
Steps To Reproduce:
I have a system unit for the agent which runs a script
The intention of the script is to start the rke2-agent and to ensure that the current agent configuration is on the host. I do this because core-os ignores the
/bin/systemctl enable rke2-agent.service
between reboots. The core os ignition file injects this to run on boot.The password is for example
6ea01d43a4b573d91e182de8f10bce2f
on Reboot the password becomes
a93405ee4af52472a547eb95e1a62301
When the secret is removed the new secret can be added and the node joins the cluster
Installed RKE2 via tarball on using a one-off system unit which downloads the runtime and then executes tarball based install.
Expected behavior:
I expected the node password not to be reset by my script on boot.
I expected that the config agent-token would generate some password hash deterministically across reboots.
I expect that neither
/bin/systemctl start --no-block --now rke2-agent.service
nor/bin/systemctl enable rke2-agent.service
would generate a new password.
Actual behavior:
Each reboot generates a new node password, the node password cannot be used to rejoin. When the node-password secret for the node is removed - the node rejoins.
Additional context / logs:
The log complains about a pre-existing node with the same name in the cluster.
** Please indicate if I'm misunderstanding something or using this incorrectly **
Thank you.
The text was updated successfully, but these errors were encountered: