-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrangler.cattle.io/cisnetworkpolicy-node
finalizer being left behind on newer RKE2 versions when deleting RKE2 nodes
#5855
Comments
Right, because previously the controller was running when it shouldn't and added the finalizer, now it no longer runs but the finalizer is still there. This is a fun issue with conditionally enabled controllers that add finalizers; they're hard to clean up after. I guess we should run a quick startup check to remove node finalizers in an Lines 118 to 121 in d30ec2a
|
On the Rancher side we traditionally either The issue here is it is not exactly clear to a user why their node deletion is hanging, and this is furthermore causing issues with provisioning as the Regardless, it's a regression and not something I think should be handled on the provisioning side, hence why I filed this issue. |
PR to remove the finalizer opened, should land for May cycle. Unfortunately 1.26 is a couple months EOL so the fix will only be for v1.27+. |
We're moving this out to June due to 1.30 delays and a tight code freeze window. Please let us know if that conflicts with any plans you had, @Oats87 |
not sure if that is related, but I've noticed that after upgrade from node events shows that CCM is continuously trying to remove the node:
I have tried to remove node manually with
When I remove this finalizer node is getting removed automatically. cni: cilium this looks similar to #1895 |
Validated on Version:-$ rke2 version v1.30.1+dev.3aaa16c9 (3aaa16c9b17da45e9f3475ba5011ed90a49a2e42)
Environment DetailsInfrastructure Node(s) CPU architecture, OS, and Version: Cluster Configuration: Steps to validate the fix
Reproduction Issue:
Validation Results:
|
Environmental Info:
RKE2 Version:
v1.26.15+rke2r1
Node(s) CPU architecture, OS, and Version:
Not Applicable
Cluster Configuration:
1 Server, 2 Agents
Describe the bug:
When using RKE2 with
cni: none
andprofile: cis-1.23
as options (and bringing your own CNI), after upgrade pastv1.26.14+rke2r1
, it is no longer possible to foreground delete nodes from the cluster.Steps To Reproduce:
On the server node:
On the agent nodes:
Observe it is possible to see the
nodes
goReady
, and you can delete a node at this point i.e.kubectl delete node <my-node>
works.Next, upgrade to
v1.26.15+rke2r1
i.e.and
after the cluster is back and all
Ready
from a node/kubelet perspective, attempt to delete a node and watch that it never deletes due to an orphaned finalizer i.e.wrangler.cattle.io/cisnetworkpolicy-node
Expected behavior:
My node deletes
Actual behavior:
Node hangs in deletion
Additional context / logs:
Looks like this regression was added with this PR: #5461
The text was updated successfully, but these errors were encountered: