-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] machine_labels are 'sticky' to the position of the machine_pool in the list of machine_pools #1254
Comments
@a-blender @kkaempf a gentle nudge for this issue; Is everything clear, or is more info or reproduction details needed? |
Reassigning for better visibility. |
Is something blocking this? @snasovich @Jono-SUSE-Rancher |
@snasovich Again we hit this in production because it's such a subtle thing to get wrong causing redeploys of machines with wrong labels. Is there something we can do to help fix this? |
Hello @hansbogert, we have been monitoring this issue for quite some time now because we are also facing it in our Rancher-provisioned on-premise clusters. We have looked through the code in order to understand where this problem even comes from, and may have found a workaround. We have been using this workaround for quite some time now and it seems to have gotten rid of the problem of "inherited" node-pool labels (and also taints, which have the same problem) for us. In our opinion this problem arises from the following checks in the flattenClusterV2RKEConfigMachinePools function terraform-provider-rancher2/rancher2/structure_cluster_v2_rke_config_machine_pool.go Line 47 in e9c7b70
terraform-provider-rancher2/rancher2/structure_cluster_v2_rke_config_machine_pool.go Line 72 in e9c7b70
and terraform-provider-rancher2/rancher2/structure_cluster_v2_rke_config_machine_pool.go Line 82 in e9c7b70
This leads to the fact that when the labels map and the taints slice are empty for the new pool, the returned list from this function does not include the corresponding "labels" or "taints" key for the newly created pool and as such Terraform does not register a diff for these fields. Therefore the old taints and labels of the pool previously inheriting this position in the full list of machine_pools are not removed. Our workaround consists of removing the two length checks mentioned above. This is unproblematic because in the case of the flattenTaintsV2 function which is called, a len > 0 check is even included in the function. For the toMapInterface function an empty map is returned, which is also fine and does not break anything. Only downside of removing this check is that it suggests that removing taints on existing pools is actually possible (since Terraform will now report a diff when you remove all taints from an existing pool) while in reality, adding or removing taints is only possible on node registration and will have no effect on existing nodes of a pool. Using this workaround we are able to circumvent this issue by maintaining our own build of the Terraform provider binary and using it with the developer overrides functionality of Terraform. This is of course a little cumbersome and an upstream fix would be preferable, but due to the taints problem mentioned above, this is not something we have suggested to the maintainers (yet). Hope this helps you with your issue as well and maybe we can have some input from the maintainers, whether this could be properly incorporated in the provider at some point.. |
By the way, the same issue (with different root cause) also exists for machineDeploymentAnnotations used by the cluster-autoscaler. These are always of the form "cluster.provisioning.cattle.io/...", like for example the max pool-size annotation "cluster.provisioning.cattle.io/autoscaler-max-size". These are also inherited to newly created pools, because this check
But this is probably a topic for a separate issue. |
Rancher Server Setup
Information about the Cluster
Provider Information
Describe the bug
having machine labels in machine_pools causes unnecessary recreation of machine pools during deletion of other machine_pools. Worse though is that these newly recreated
machine_pool
machines now can have wrongmachine_labels
on them.The situation in which this actually occurs is explained below.
To Reproduce
Have a cluster with the following machine_pools (in pseudo-config)
Remove
pool1
using TerraformActual Result
in pseudo config:
Expected Result
In pseudo config:
Additional context
There are multiple issues at play here:
machine_labels
is optional, but computed in the Terraform schema. Why is this computed? This is an edge case in the Terraform SDK, similar to this oneIf the
machine_labels
does not need to be computed, then this issue is easily solved by removing the computed attribute. I've verified that the behavior is then correct.The text was updated successfully, but these errors were encountered: