diff --git a/keps/prod-readiness/sig-node/5027.yaml b/keps/prod-readiness/sig-node/5027.yaml new file mode 100644 index 000000000000..47cdd47b8f32 --- /dev/null +++ b/keps/prod-readiness/sig-node/5027.yaml @@ -0,0 +1,6 @@ +# The KEP must have an approver from the +# "prod-readiness-approvers" group +# of http://git.k8s.io/enhancements/OWNERS_ALIASES +kep-number: 5027 +alpha: + approver: "@johnbelamaric" diff --git a/keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md b/keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md new file mode 100644 index 000000000000..2df4284a77db --- /dev/null +++ b/keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md @@ -0,0 +1,807 @@ + +# [KEP-5027](https://github.com/kubernetes/enhancements/issues/5027): DRA: admin-controlled device attributes (device health, maintenance, priority) + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Notes/Constraints/Caveats](#notesconstraintscaveats) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API](#api) + - [Merging ResourceSliceOverrides and ResourceSlices](#merging-resourcesliceoverrides-and-resourceslices) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +With Dynamic Resource Allocation (DRA), DRA drivers publish information about +the devices that they manage in ResourceSlices. This information is used by the +scheduler when selecting devices for user requests in ResourceClaims. + +This KEP adds a Kubernetes API that privileged users, typically cluster +administrators, can use to override or extend that information. This can be +permanent as part of the installation of a DRA driver to adapt the driver to +the cluster or temporary as part of cluster maintenance. + +This generic mechanism can be used to mark devices as offline, regardless +whether it is because of device health as observed by some component other than +the driver itself or because the cluster administrator is doing some manual +maintenance work. + +The other usage is to influence which devices are picked when there are +multiple viable alternatives. This is a first step towards providing a more +comprehensive [scoring](https://github.com/kubernetes/enhancements/issues/4970) +solution. + +For both usages this KEP standardizes device attributes that need to be checked +by the scheduler before using a device for a ResourceClaim. + +## Motivation + +### Goals + +- Enable [admin-controlled](../5055-dra-device-taints-and-tolerations) device taints. + +- Enable updating how devices are seen in the cluster without having to use + driver-specific APIs which influence what a driver puts into ResourceSlices. + +### Non-Goals + +## Proposal + +The intent to override device attributes must be recorded persistently so that +it is preserved even when a ResourceSlice gets removed or updated. To achieve +this, a new cluster-scoped ResourceSliceOverride type gets added. A single +ResourceSliceOverride object specifies device attributes that apply to all +devices matching a CEL expression, i.e. the same way as users select devices in +a ResourceClaim. + +The scheduler must merge these additional attributes with the ones provided by +the DRA drivers on-the-fly while it gathers information about available +devices. + +### Notes/Constraints/Caveats + +Users who look at ResourceSlices to figure out which devices are available also +need to consider ResourceSliceOverrides to get the full picture. Copying from +the ResourceSliceOverride spec into the ResourceSlice status could help here, +but would not be instantaneous and potentially cause write amplification (one +ResourceSliceOverride affecting many different devices) and therefore is not +part of this proposal. + +Perhaps `kubectl describe resourceslices` can be extended to include the +additional information. For now this is out of scope. + +Creating a ResourceSliceOverride is racing with on-going scheduling attempts, +which is unavoidable. + +### Risks and Mitigations + + + +## Design Details + +### API + +The ResourceSliceOverride is a cluster-scoped type in the `resource.k8s.io` API +group, initially in `v1alpha3` (the alpha version in Kubernetes 1.32). Because +it may be useful to clean up after disabling the feature and because the +device taint feature also uses this type, it gets served unconditionally as long as +the `v1alpha3` version is enabled. Fields related specifically to this KEP +are feature-gated. + +```Go +type ResourceSliceOverride struct { +metav1.TypeMeta + // Standard object metadata + // +optional + metav1.ObjectMeta + + // Changing the spec automatically increments the metadata.generation number. + Spec ResourceSliceSpec +} + +type ResourceSliceOverrideSpec struct { + // Devices defines how to override device attributes and taints. + Devices DeviceOverride +} + +type DeviceOverride struct { + // Selectors define criteria which must be satisfied by a specific + // device, otherwise the override is ignore for it. + // All selectors must be satisfied. + // + // +required + // +listType=atomic + Selectors []DeviceSelector + + // If a ResourceSlice and a DeviceOverride define the same attribute or + // capacity, the value of the DeviceOverride is used. If multiple + // different DeviceOverrides match the same device, then the one with + // the highest rank wins. If the ranks are the same, it is non-deterministic + // which override is used. + Rank int + + // Attributes defines the set of attributes to override for matching devices. + // The name of each attribute must be unique in that set and + // include the domain prefix. + // + // The maximum number of attributes and capacities combined is 32. + // This is an alpha field and requires enabling the DRAAdminControlledDeviceAttributes + // feature gate. + // + // +optional + // +featureGate:DRAAdminControlledDeviceAttributes + Attributes map[FullyQualifiedName]DeviceAttribute + + // Capacity defines the set of capacities to override for matching devices. + // The name of each capacity must be unique in that set and + // include the domain prefix. + // + // The maximum number of attributes and capacities combined is 32. + // This is an alpha field and requires enabling the DRAAdminControlledDeviceAttributes + // feature gate. + // + // +optional + // +featureGate:DRAAdminControlledDeviceAttributes + Capacity map[FullyQualifiedName]DeviceCapacity +} +``` + +### Merging ResourceSliceOverrides and ResourceSlices + +Helper code which keeps an up-to-date list of devices with all overrides added +to them will be provided as part of `k8s.io/dynamic-resource-allocation`. It +will be based on informers such that evaluating the CEL selectors only is +necessary when ResourceSlices or ResourceSliceOverrides change. + +If a CEL expression fails for a device, the override does not apply and an +event will be generated for the ResourceSliceOverride with the faulty CEL +expression. + +### Test Plan + +[X] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + +None. + +##### Unit tests + + + + + +v1.32.0: + +- `k8s.io/dynamic-resource-allocation/structured`: 91.3% +- `k8s.io/kubernetes/pkg/apis/resource/validation`: 98.6% + +##### Integration tests + + + + + +Additional scenarios will be added to `test/integration/scheduler_perf`, not +just for correctness but also to evaluate a potential performance impact. + +- : + +##### e2e tests + + + +One E2E test scenario is to mark all devices as offline and then verify that +pods don't get scheduled. Another is to set different priorities and check that +the scheduler picks the device with the highest one. Some care will be needed +to avoid flakes because the creation of a ResourceSliceOverride will not be +immediately visible to the scheduler. + +- : + +### Graduation Criteria + +#### Alpha + +- Feature implemented behind a feature flag +- Initial e2e tests completed and enabled + +#### Beta + +- Gather feedback from developers and surveys +- Additional tests are in Testgrid and linked in KEP + +#### GA + +- 3 examples of real-world usage +- Allowing time for feedback +- [Conformance tests] + +[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + +Instead of ResourceSliceOverride as a separate type, new fields in the +ResourceSlice status could be modified by an admin. That has the problem that +the ResourceSlice object might get deleted while doing cluster maintenance like +a driver update, in which case the admin intent would get lost. diff --git a/keps/sig-node/5027-dra-admin-controlled-device-attributes/kep.yaml b/keps/sig-node/5027-dra-admin-controlled-device-attributes/kep.yaml new file mode 100644 index 000000000000..f0ef6b62dcb8 --- /dev/null +++ b/keps/sig-node/5027-dra-admin-controlled-device-attributes/kep.yaml @@ -0,0 +1,40 @@ +title: "DRA: admin-controlled device attributes" +kep-number: 5027 +authors: + - "@pohly" +owning-sig: sig-node +participating-sigs: + - sig-scheduling +status: implementable +creation-date: 2024-01-10 +reviewers: + - TBD +approvers: + - TBD + +see-also: + - "/keps/sig-node/4381-dra-structured-parameters" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.33" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.33" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: DRAAdminControlledDeviceAttributes + components: + - kube-apiserver + - kube-scheduler +disable-supported: true + +# The following PRR answers are required at beta release +metrics: diff --git a/keps/sig-node/5055-device-taints-and-tolerations/README.md b/keps/sig-node/5055-device-taints-and-tolerations/README.md new file mode 100644 index 000000000000..ccdb76a88cbf --- /dev/null +++ b/keps/sig-node/5055-device-taints-and-tolerations/README.md @@ -0,0 +1,971 @@ + +# KEP-NNNN: Your short, descriptive title + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Degraded Devices](#degraded-devices) + - [External Health Monitoring](#external-health-monitoring) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API](#api) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +With Dynamic Resource Allocation (DRA), DRA drivers publish information about +the devices that they manage in ResourceSlices. This information is used by the +scheduler when selecting devices for user requests in ResourceClaims. + +With this KEP, DRA drivers can mark devices as tainted such that they won't be +used for scheduling new pods. In addition, pods already running with access to +a tainted device can be stopped automatically. Cluster administrators can do +the same by creating a +[ResourceSliceOverride](../5027-dra-admin-controlled-device-attributes] with a +taint. + +Users can decide to ignore specific taints by adding tolerations to their +ResourceClaim. + +## Motivation + + +### Goals + +- Enable taking devices offline for maintenance while still allowing tests pods + to request and use those devices. + +- Enable users to decide whether they want to keep running a workload in a degraded + mode while a device is unhealthy or prefer to get pods rescheduled. + + +### Non-Goals + +## Proposal + +#### Degraded Devices + +As driver itself can detect problems which may or not be tolerable for +workloads, like degraded performance due to overheating. Removing such devices +from the ResourceSlice would unconditionally prevent using them for new +pods. Instead, publishing with a taint informs users about this degradation and +leaves them the choice whether the device is still usable enough to run pods. +It also automates stopping pods which don't tolerate such a degradation. + +#### External Health Monitoring + +As cluster admin, I am deploying a vendor-provided DRA driver together with a +separate monitoring component for hardware aspects that are not available or +not supported by that DRA driver. When that component detects problems, it can +check its policy configuration and decide to take devices offline by creating +a ResourceSliceOverride with a taint for affected devices. + +### Risks and Mitigations + + + +## Design Details + +The feature is following the approach and APIs taken for node taints and +applies them to devices. A new controller watches tainted devices and deletes +pods using them unless they tolerate the device taint, similar to the +[taint-eviction-controller](https://github.com/kubernetes/kubernetes/blob/32130691a4cb8a1034b999341c40e48d197f5465/pkg/controller/tainteviction/taint_eviction.go#L81-L83). + +Taints are cumulative as long as the key and effect pairs are different: +- Taints defined by an admin in a ResourceSliceOverride get added to the + set of taints defined by the DRA driver in a ResourceSlice. +- Taints with the same key and effect get overwritten, using the same + precedence as for attributes. + +This merging will be implemented by the same code that also +overrides device attributes. + +To ensure consistency among all pods sharing a ResourceClaim, the toleration +for taints gets added to the request in a ResourceClaim, not the pod. This also +avoids conflicts like one pod tolerating a taint for scheduling and some other +pod not tolerating that. + +### API + +The ResourceSlice content gets extended: + +```Go +// BasicDevice defines one device instance. +type BasicDevice struct { + ... + + // If specified, the device's taints. + // + // This is an alpha field and requires enabling the DRADeviceTaints + // feature gate. + // + // +optional + // +listType=atomic + // +featureGate=DRADeviceTaints + Taints []Taint +} + +// The device this Taint is attached to has the "effect" on +// any claim and, through the claim, to pods that do not tolerate +// the Taint. +type Taint struct { + // The taint key to be applied to a device. + // Must be a label name. + // + // +required + Key string + + // The taint value corresponding to the taint key. + // Must be a label value. + // + // +optional + Value string + + // The effect of the taint on claims and pods that do not tolerate the taint. + // Valid effects are NoSchedule and NoExecute. PreferNoSchedule as used for + // nodes is not valid here. + // + // +required + Effect TaintEffect + + // ^^^^ + // + // Implementing PreferNoSchedule would depend on a scoring solution for DRA. + // It might get added as part of that. + + // TimeAdded represents the time at which the taint was added. + // It is only written for NoExecute taints. + // + // +optional + TimeAdded *metav1.Time +} +``` + +Taint has the exact same fields as a v1.Taint, but the description is a bit +different. In particular, PreferNoSchedule is not valid. + +As with node taints, the key is typically a short string. The meaning of the value +depends on the key. It is allowed to be longer. The core v1 API does not impose +length limitations for these fields. The `resource.k8s.io` API does. + +Tolerations get added to a DeviceRequest: + +```Go +type DeviceRequest struct { + ... + + // If specified, the request's tolerations. + // + // Tolerations for NoSchedule are required to allocate a + // device which has a taint with that effect. The same applies + // to NoExecute. + // + // In addition, should any of the allocated devices get tainted + // with NoExecute after allocation and that effect is not tolerated, + // then all pods consuming the ResourceClaim get deleted to evict + // them. The scheduler will not let new pods reserve the claim while + // it has these tainted devices. Once all pods are evicted, the + // claim will get deallocated. + // + // +optional + // +listType=atomic + Tolerations []Toleration `json:"tolerations,omitempty" protobuf:"bytes,22,opt,name=tolerations"` +} + +// The ResourceClaim this Toleration is attached to tolerates any taint that matches +// the triple using the matching operator . +type Toleration struct { + // Key is the taint key that the toleration applies to. Empty means match all taint keys. + // If the key is empty, operator must be Exists; this combination means to match all values and all keys. + // Must be a label name. + // + // +optional + Key string + + // Operator represents a key's relationship to the value. + // Valid operators are Exists and Equal. Defaults to Equal. + // Exists is equivalent to wildcard for value, so that a ResourceClaim can + // tolerate all taints of a particular category. + // + // +optional + Operator TolerationOperator + + // Value is the taint value the toleration matches to. + // If the operator is Exists, the value should be empty, otherwise just a regular string. + // Must be a label value. + // + // +optional + Value string `json:"value,omitempty" protobuf:"bytes,3,opt,name=value"` + + // Effect indicates the taint effect to match. Empty means match all taint effects. + // When specified, allowed values are NoSchedule and NoExecute. + // + // +optional + Effect TaintEffect + + // TolerationSeconds represents the period of time the toleration (which must be + // of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, + // it is not set, which means tolerate the taint forever (do not evict). Zero and + // negative values will be treated as 0 (evict immediately) by the system. + // + // +optional + TolerationSeconds *int64 +} + +// A toleration operator is the set of operators that can be used in a toleration. +// +// +enum +type TolerationOperator string + +const ( + TolerationOpExists TolerationOperator = "Exists" + TolerationOpEqual TolerationOperator = "Equal" +) +``` + +As with Taint, these structs get duplicated to enable documenting DRA specific +behavior and to ensure that future extensions do not get inherited +accidentally. + +Generated conversion code might make it possible to reuse existing helper +code. Alternatively, that code can be copied. + +The DeviceOverride also gets extended. It is possible to use +admin-controlled taints without enabling attribute overrides by enabling the +`v1alpha3` API and only the `DRADeviceTaints` feature, while leaving +`DRAAdminControlledDeviceAttributes` disabled. + +```Go +type DeviceOverride struct { + ... + + // If specified, the device's taints. Taints with unique key and effect + // get added to the set of taints of the device. When key and effect + // are used in multiple places, the same precedence rules as for attributes apply + // (see the rank field). + // + // This is an alpha field and requires enabling the DRADeviceTaints + // feature gate. + // + // +optional + // +listType=atomic + // +featureGate=DRADeviceTaints + Taints []Taint +``` + +### Test Plan + + + +[ ] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +v1.32.0: + +- `k8s.io/dynamic-resource-allocation/structured`: 91.3% +- `k8s.io/kubernetes/pkg/apis/resource/validation`: 98.6% +- `k8s.io/kubernetes/pkg/controller/tainteviction`: 81.8% + +##### Integration tests + + + + + +- : + +##### e2e tests + + + +- : + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + +The existing taint-eviction-controller could be extended to cover device +taints. Cloning it lowers the risk of breaking existing stable functionality. + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-node/5055-device-taints-and-tolerations/kep.yaml b/keps/sig-node/5055-device-taints-and-tolerations/kep.yaml new file mode 100644 index 000000000000..a6b714da881b --- /dev/null +++ b/keps/sig-node/5055-device-taints-and-tolerations/kep.yaml @@ -0,0 +1,41 @@ +title: "DRA: device taints and tolerations" +kep-number: 5055 +authors: + - "@pohly" +owning-sig: sig-node +participating-sigs: + - sig-scheduling +status: implementable +creation-date: 2025-01-20 +reviewers: + - TBD +approvers: + - TBD + +see-also: + - "/keps/sig-node/5027-dra-admin-controlled-device-attributes" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.33" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.33" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: DRADeviceTaints + components: + - kube-apiserver + - kube-scheduler + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: