You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have various scenarios where an invalid Workflow will be allowed in the cluster, causing bugs to be found much later instead of upfront. The CLI and API handle (some) validation, but direct to k8s will miss some of this.
Note that even if we do have a validating admission webhook, it may be optional, it may not handle resources that were created before it existed, and it may not be able to validate all scenarios, so the Controller and Informers would still have to be "tolerant" (the word the codebase uses) of invalid resources. This is primarily a feature to improve UX
Implementation Details
There are 2 primary ways of doing Validating Admission in k8s these days:
This is simple in terms of logic, in that we'd just re-use the existing validation logic, either with the Controller or the Server
Typically admission is a Controller responsibility, but hosting this behavior in the Controller would require that it respond to API requests. I have seen other tools host as a separate component entirely as well, e.g. in a Helm Chart as argo-workflows.validatingAdmissionWebhook.enabled: true or similar.
Problematically, k8s webhooks require TLS certs that the k8s control plane trusts, which complicates Deployment quite a bit as it requires cert-manager etc.
See also Cert manager suddenly a dependency??? #9737 that occurred after cert-manager was unintentionally required in the manifests for external-facing TLS certs (not internal ones).
There is also a question of failing open or failing closed. I think we can fail closed, and latency sensitive use-cases could just not deploy / not enable the webhook entirely.
Problematically, our CRDs are already too large (see also Full CRD installation fails #11266, Large CRDs go over size limits (e.g. those with embedded podspecs) kubernetes/kubernetes#82292), which is why we have "minimal" CRDs that have no schema. Otherwise the JSON schema itself would already do some form of schema validation. The lack of this means that we have little to no up-front validation of direct to k8s submissions (the Controller will still log them and the status may still display it as invalid, but those are post-hoc signals that a user has to notice)
Or can be separate resources. Given the problems with CRD size, this might be the best possible solution
There are some more dynamic validation scenarios that would be hard to cover via CEL, but it could probably still cover a decent bit
I would probably suggest an optional of version of 1 for consistency, and then 2.b. as something deployed with all manifests as a baseline. We could even start making simple policies already and add them to the manifests and then gradually build them up -- #6781 is a very simple length check, for instance
Message from the maintainers:
Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.
The text was updated successfully, but these errors were encountered:
I would like this to happen, especially Admission Webhooks.
There are certain classes of error that Admission Policies couldn't be crafted to help with such as invalid fields in yaml dictionaries which are a common class of error - either just misspellings or using a field which isn't valid in this location.
That one could be detected actually as CEL Macros include has() and exists(). But detecting all invalid schemas is likely non-trivial
More dynamic things though, like detecting certain kinds of invalid expr expression, are not possible.
Technically even some simple invalid expressions (like #12899) could be detected with plain string parsing, but it would make for some convoluted/hacky code.
Summary
We currently have various scenarios where an invalid
Workflow
will be allowed in the cluster, causing bugs to be found much later instead of upfront. The CLI and API handle (some) validation, but direct to k8s will miss some of this.Use Cases
The lack of any validation when submitting directly to k8s can cause some unexpected scenarios like #6781 , #12693, #10630. In all those issues and some others, I've mentioned the concept of a Validating Admission Webhook to prevent these scenarios and give clearer error messages: #12693 (comment), #10630 (comment), #6781 (comment)
Note that even if we do have a validating admission webhook, it may be optional, it may not handle resources that were created before it existed, and it may not be able to validate all scenarios, so the Controller and Informers would still have to be "tolerant" (the word the codebase uses) of invalid resources. This is primarily a feature to improve UX
Implementation Details
There are 2 primary ways of doing Validating Admission in k8s these days:
Admission Webhooks
argo-workflows.validatingAdmissionWebhook.enabled: true
or similar.cert-manager
etc.cert-manager
was unintentionally required in the manifests for external-facing TLS certs (not internal ones).Admission Policies (stable as of k8s 1.30)
status
may still display it as invalid, but those are post-hoc signals that a user has to notice)I would probably suggest an optional of version of 1 for consistency, and then 2.b. as something deployed with all manifests as a baseline. We could even start making simple policies already and add them to the manifests and then gradually build them up -- #6781 is a very simple length check, for instance
Message from the maintainers:
Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.
The text was updated successfully, but these errors were encountered: