Validating Admission Webhook #13503

agilgur5 · 2024-08-26T05:41:05Z

Summary

We currently have various scenarios where an invalid Workflow will be allowed in the cluster, causing bugs to be found much later instead of upfront. The CLI and API handle (some) validation, but direct to k8s will miss some of this.

Use Cases

The lack of any validation when submitting directly to k8s can cause some unexpected scenarios like #6781 , #12693, #10630. In all those issues and some others, I've mentioned the concept of a Validating Admission Webhook to prevent these scenarios and give clearer error messages: #12693 (comment), #10630 (comment), #6781 (comment)

Note that even if we do have a validating admission webhook, it may be optional, it may not handle resources that were created before it existed, and it may not be able to validate all scenarios, so the Controller and Informers would still have to be "tolerant" (the word the codebase uses) of invalid resources. This is primarily a feature to improve UX

Implementation Details

There are 2 primary ways of doing Validating Admission in k8s these days:

Admission Webhooks
- This is simple in terms of logic, in that we'd just re-use the existing validation logic, either with the Controller or the Server
  - Typically admission is a Controller responsibility, but hosting this behavior in the Controller would require that it respond to API requests. I have seen other tools host as a separate component entirely as well, e.g. in a Helm Chart as argo-workflows.validatingAdmissionWebhook.enabled: true or similar.
- Problematically, k8s webhooks require TLS certs that the k8s control plane trusts, which complicates Deployment quite a bit as it requires cert-manager etc.
  - See also Cert manager suddenly a dependency??? #9737 that occurred after cert-manager was unintentionally required in the manifests for external-facing TLS certs (not internal ones).
- There is also a question of failing open or failing closed. I think we can fail closed, and latency sensitive use-cases could just not deploy / not enable the webhook entirely.
Admission Policies (stable as of k8s 1.30)
- This is basically a policy engine built-in to k8s (instead of external like Kyverno, OPA Gatekeeper, etc) that supports CEL
- This would require rewriting some validation in CEL, but is otherwise quite straightforward to deploy:
  1. Can be used directly in-line with CRDs
    - Problematically, our CRDs are already too large (see also Full CRD installation fails #11266, Large CRDs go over size limits (e.g. those with embedded podspecs) kubernetes/kubernetes#82292), which is why we have "minimal" CRDs that have no schema. Otherwise the JSON schema itself would already do some form of schema validation. The lack of this means that we have little to no up-front validation of direct to k8s submissions (the Controller will still log them and the status may still display it as invalid, but those are post-hoc signals that a user has to notice)
  2. Or can be separate resources. Given the problems with CRD size, this might be the best possible solution
- There are some more dynamic validation scenarios that would be hard to cover via CEL, but it could probably still cover a decent bit

I would probably suggest an optional of version of 1 for consistency, and then 2.b. as something deployed with all manifests as a baseline. We could even start making simple policies already and add them to the manifests and then gradually build them up -- #6781 is a very simple length check, for instance

Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

The text was updated successfully, but these errors were encountered:

Joibel · 2024-08-27T10:46:23Z

I would like this to happen, especially Admission Webhooks.

There are certain classes of error that Admission Policies couldn't be crafted to help with such as invalid fields in yaml dictionaries which are a common class of error - either just misspellings or using a field which isn't valid in this location.

agilgur5 · 2024-08-27T13:18:16Z

invalid fields in yaml dictionaries

That one could be detected actually as CEL Macros include has() and exists(). But detecting all invalid schemas is likely non-trivial

More dynamic things though, like detecting certain kinds of invalid expr expression, are not possible.
Technically even some simple invalid expressions (like #12899) could be detected with plain string parsing, but it would make for some convoluted/hacky code.

agilgur5 added the type/feature Feature request label Aug 26, 2024

This was referenced Aug 27, 2024

proposal: splitting the controller into multiple deployments #13517

Open

Workflow Spec should include version information #13544

Open

williamburgson linked a pull request Nov 8, 2024 that will close this issue

feat: add validating webhook for workflows #13879

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validating Admission Webhook #13503

Validating Admission Webhook #13503

agilgur5 commented Aug 26, 2024 •

edited

Loading

Joibel commented Aug 27, 2024

agilgur5 commented Aug 27, 2024 •

edited

Loading

Validating Admission Webhook #13503

Validating Admission Webhook #13503

Comments

agilgur5 commented Aug 26, 2024 • edited Loading

Summary

Use Cases

Implementation Details

Joibel commented Aug 27, 2024

agilgur5 commented Aug 27, 2024 • edited Loading

agilgur5 commented Aug 26, 2024 •

edited

Loading

agilgur5 commented Aug 27, 2024 •

edited

Loading