-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[99][sig-policy] Policy placement strategy
ref: https://issues.redhat.com/browse/ACM-6523 Signed-off-by: Dale Haiducek <[email protected]>
- Loading branch information
Showing
2 changed files
with
277 additions
and
0 deletions.
There are no files selected for viewing
267 changes: 267 additions & 0 deletions
267
enhancements/sig-policy/99-policy-placement-strategy/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,267 @@ | ||
# Policy placement strategy | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in | ||
[website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/) | ||
|
||
## Summary | ||
|
||
Following from the [`DecisionStrategy`](../../sig-architecture/64-placementStrategy/README.md) field in the `Placement` | ||
API, policies can leverage this new logic to have a configurable and systematic way to roll out policy updates to | ||
clusters. | ||
|
||
## Motivation | ||
|
||
Currently policies and subsequent updates are pushed out to clusters en masse based on the placement to which it has | ||
been bound. The new `DecisionStrategy` field in the `Placement` API creates cluster groupings that controllers can | ||
leverage to have segmented, configurable rollouts of resources. This will aid use cases where high availability may be a | ||
priority or a set of test clusters should receive and verify the updates before the remaining clusters get updated. | ||
|
||
### Goals | ||
|
||
- Make `Placement` the primary API for placement (currently the governance propagator is somewhat | ||
`PlacementRule`-centric) | ||
- Leverage the `Placement` helper library to retrieve placement decisions | ||
- Reflect rollout status per cluster in the root policy for discoverability (whether a cluster is up-to-date or not) | ||
- (stretch) Implement the `RolloutStrategy` struct for policies, including: | ||
- `RolloutStrategy` | ||
- "All": all clusters at once | ||
- "Progressive": one cluster at a time | ||
- "ProgressivePerGroup": one group of clusters at a time | ||
- `Timeout`: Maximum amount of time to wait for success before continuing the rollout | ||
- `MandatoryDecisionGroups`: groups that should be handled before other groups | ||
- `MaxConcurrency`: Concurrency during a progressive rollout | ||
- (stretch) Add an aggregated rollout status for the root policy status. | ||
|
||
### Non-Goals | ||
|
||
Any specialized behaviors outside of those provided by the `Placement` library and, by extension, the `DecisionStrategy` | ||
enhancement, that might require additional code other than that already provided/required by the `Placement` library. | ||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
#### Story 1 / Phase 1 | ||
|
||
As a system administrator, I want to know the status of a rollout for a policy and what clusters have been updated. | ||
|
||
- **Summary** | ||
|
||
- Add an annotation to the replicated Policy indicating whether a policy is up-to-date. | ||
|
||
- **Details** | ||
|
||
- Update the `governance-policy-propagator` controller to add a `open-cluster-management.io/root-policy-generation` | ||
annotation to the replicated policy for each cluster. | ||
- Update the annotation with the policy's generation at the time of the update when a policy has been replicated. | ||
- (Optional--requirement for phase 3) Update the `governance-policy-propagator` controller to add a | ||
`open-cluster-management.io/root-policy-timestamp` annotation to the replicated policy for each cluster indicating | ||
when the replicated policy was last applied. | ||
|
||
#### Story 2 / Phase 2 | ||
|
||
As a system administrator, I want to use a placement `DecisionStrategy` with policies. | ||
|
||
- **Summary** | ||
|
||
- Use the `Placement` library to gather the list of clusters and iterate over them as it is today. | ||
|
||
- **Details** | ||
|
||
- Make `Placement` the primary placement resource and parse `PlacementRule` decisions into the `Placement` struct | ||
instead of the `PlacementRule` struct. | ||
- Using the `Placement` library and implementing a basic `ClusterRolloutStatusFunc`, iterate over all clusters as | ||
usual using the `All` rollout strategy. The `RolloutStatus` will initially be either `ToApply` or `Succeed`. | ||
- (Optional cleanup) Remove the current `Placement.Decisions` from the `status` of the `Policy` CRD. It's uncertain | ||
why `Policy` would store something that's discoverable elsewhere, and it's not certain it's in use currently. | ||
- (Optional) Remove the `ClusterNamespace` from the `status` of the `Policy` CRD (`Placement` doesn't have this, so it | ||
ought to be safe to remove it at this point if it's feasible and not overly complex to do so). | ||
|
||
- **Snippet (untested)** | ||
|
||
```golang | ||
// common.go | ||
type placementDecisionGetter struct { | ||
c client.Client | ||
} | ||
|
||
func (pd placementDecisionGetter) List(selector labels.Selector, namespace string) ([]*clusterv1beta1.PlacementDecision, error) { | ||
pdList := &clusterv1beta1.PlacementDecisionList{} | ||
lopts := &client.ListOptions{ | ||
LabelSelector: selector, | ||
Namespace: namespace, | ||
} | ||
|
||
err := pd.c.List(context.TODO(), pdList, lopts) | ||
pdPtrList := []*clusterv1beta1.PlacementDecision{} | ||
for _, pd := range pdList.Items { | ||
pdPtrList = append(pdPtrList, &pd) | ||
} | ||
|
||
return pdPtrList, err | ||
} | ||
|
||
func GetRolloutHandler(c client.Client, placement *clusterv1beta1.Placement) (*clusterv1alpha1.RolloutHandler, error) { | ||
pdTracker := clusterv1beta1.NewPlacementDecisionClustersTracker(placement, placementDecisionGetter{c}, nil, nil) | ||
|
||
return clusterv1alpha1.NewRolloutHandler(pdTracker) | ||
} | ||
|
||
var GetClusterRolloutStatus clusterv1alpha1.ClusterRolloutStatusFunc = func(clusterName string) clusterv1alpha1.ClusterRolloutStatus { | ||
// (Phase 3 rollout logic goes here) | ||
|
||
return clusterv1alpha1.ClusterRolloutStatus{ | ||
Status: "", // initially determined by comparing the generation of the policy with the particular | ||
// cluster's generation, resolving to either "ToApply" or "Succeed" | ||
LastTransitionTime: "", // retrieved from the last transition time in the cluster's status | ||
} | ||
} | ||
``` | ||
|
||
```golang | ||
rolloutHandler := common.GetRolloutHandler(client, placement) | ||
strategy, rolloutResult, err := rolloutHandler.GetRolloutCluster(clusterv1alpha1.All, common.GetClusterRolloutStatus) | ||
|
||
// ... Use rolloutResults for propagation | ||
``` | ||
|
||
#### Story 3 / Phase 3 (Stretch goal) | ||
|
||
As a system administrator, I want policies to be rolled out only when the policy on previous clusters show as | ||
"Compliant" and control their concurrency and priority. | ||
|
||
- **Summary** | ||
|
||
- Only continue policy deployment once the previously deployed clusters show as "Compliant" after a configurable | ||
timeout. | ||
|
||
- **Details** | ||
|
||
- Update the `PlacementBinding` CRD to contain the `RolloutStrategy` struct. (See the | ||
[`v1alpha1/RolloutStrategy`](https://github.com/open-cluster-management-io/api/blob/main/cluster/v1alpha1/types_rolloutstrategy.go)) | ||
Default to `All` if a strategy is not provided. | ||
- Update the `governance-policy-propagator` controller to add a `open-cluster-management.io/root-policy-timestamp` | ||
annotation to the replicated policy for each cluster indicating when the replicated policy was last applied. | ||
- Store a `lastEvaluated` in the status of the replicated policies so that the `RolloutStatus` can be determined | ||
inside of `ClusterRolloutStatusFunc`. This should only be updated if the status changes (current behavior), or the | ||
generation changes and/or the root policy timestamp is less than the `lastEvaluated` timestamp. | ||
- Update the `ClusterRolloutStatusFunc`, (the `GetClusterRolloutStatus` function from Phase 2) to set up logic to work | ||
with the `Placement` library to provide the information it needs to continue only once the previous group is | ||
compliant (or after a configurable timeout): | ||
|
||
| Status | Description | | ||
| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| `ToApply` | No generation set | | ||
| `Progressing` | Current generation and `lastEvaluated` has not been updated | | ||
| `Succeed` | Compliant and current generation | | ||
| `Failed` | NonCompliant and current generation | | ||
| `TimeOut` | Current generation, `lastEvaluated` is less than `root-policy-timestamp`, </br>and time has passed beyond timeout specified in the `RolloutStrategy` | | ||
| `Skip` | (unused) | | ||
|
||
- (Stretch) Update the `Policy` CRD to contain a `RolloutStatus`. This will be used to provide an aggregated rollout | ||
status for the root policy of `Progressing` (this would include `ToApply`), `Succeed`, or `Failed` (this would | ||
include `TimeOut`). | ||
|
||
- **Snippet** | ||
|
||
```golang | ||
type PlacementBinding struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||
// +kubebuilder:validation:Optional | ||
BindingOverrides BindingOverrides `json:"bindingOverrides,omitempty"` | ||
// +kubebuilder:validation:Optional | ||
RolloutStrategy clusterv1alpha1.RolloutStrategy `json:"rolloutStrategy,omitempty"` // <-- New field | ||
// This field provides the ability to select a subset of bound clusters | ||
// +kubebuilder:validation:Optional | ||
// +kubebuilder:validation:Enum=restricted | ||
SubFilter SubFilter `json:"subFilter,omitempty"` | ||
// +kubebuilder:validation:Required | ||
PlacementRef PlacementSubject `json:"placementRef"` | ||
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:MinItems=1 | ||
Subjects []Subject `json:"subjects"` | ||
Status PlacementBindingStatus `json:"status,omitempty"` | ||
} | ||
``` | ||
|
||
```golang | ||
// ComplianceHistory defines compliance details history | ||
type ComplianceHistory struct { | ||
LastEvaluated metav1.Time `json:"lastEvaluated,omitempty" protobuf:"bytes,7,opt,name=lastEvaluated"` // <-- New field | ||
LastTimestamp metav1.Time `json:"lastTimestamp,omitempty" protobuf:"bytes,7,opt,name=lastTimestamp"` | ||
Message string `json:"message,omitempty" protobuf:"bytes,4,opt,name=message"` | ||
EventName string `json:"eventName,omitempty"` | ||
} | ||
|
||
// PolicyStatus defines the observed state of Policy | ||
type PolicyStatus struct { | ||
Placement []*Placement `json:"placement,omitempty"` | ||
Status []*CompliancePerClusterStatus `json:"status,omitempty"` | ||
|
||
// +kubebuilder:validation:Enum=Compliant;Pending;NonCompliant | ||
ComplianceState ComplianceState `json:"compliant,omitempty"` | ||
RolloutStatus clusterv1alpha1.RolloutStatus `json:"rolloutStatus,omitempty"` // <-- New field | ||
Details []*DetailsPerTemplate `json:"details,omitempty"` | ||
} | ||
``` | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
For the `Placement` library, this requires importing at least this package version (the `Placement` library is in the | ||
`v1alpha1` version): | ||
|
||
``` | ||
open-cluster-management.io/api v0.11.1-0.20230809113658-2b2399b5f6e8 | ||
``` | ||
|
||
For testing, the `governance-policy-propagator` doesn't currently account for multiple managed clusters. As part of this | ||
enhancement, the test flows would need to be enhanced (and/or a separate workflow created) that deploys multiple managed | ||
clusters. | ||
|
||
### Risks and Mitigation | ||
|
||
- The optional proposal to remove items from the `Policy` CRD here is to prevent bloat and try to slim down the CRD, but | ||
updating the CRD could be a breaking change for users so further consideration might need to be made and the updates | ||
aren't required. | ||
- The `Placement` library is relatively new and untested outside of its repo and this implementation leans heavily on | ||
its logic. While it works in theory, there could be some tweaks/adjustments as development proceeds, lengthening time | ||
for implementation. The phased approach intends to address this to make partial implementation feasible. | ||
- Having a `LastEvaluated` field in the status is going to increase load on the hub cluster Kubernetes API server that | ||
efforts originally sought to reduce by not sending updates when the status hadn't changed. This could be remedied by | ||
only sending an updated timestamp if the generation was updated and/or the `LastEvaluated` timestamp is less than the | ||
root policy timstamp or (the current behavior) if the status indeed changed. | ||
|
||
## Design Details | ||
|
||
### Open Questions | ||
|
||
1. Should the rollout handler be instantiated with the dynamic watcher instead of the controller client? | ||
2. Should `RolloutStatus` be reflected per-cluster in the root policy? (Removing `clusterNamespace` could essentially | ||
make the CRD the same size as it currently is, remediating concerns around bloat, but the rollout status is also | ||
reflected in each replicated policy.) | ||
3. Is there opposition to removing `Placement.Decisions` from the `status`? (This could alternatively be handled outside | ||
of this enhancement.) | ||
4. Should the `RolloutStrategy` be implemented in the `Policy` CRD instead of `PlacementBinding`? | ||
5. How should a policy handle when placement decision groups shift? (This implementation leans on the `Placement` | ||
library to handle such shifts, only handling the clusters that it returns.) | ||
6. (An aside) What's with the pointer in `[]*clusterv1beta1.PlacementDecision`? | ||
|
||
### Test Plan | ||
|
||
- Unit tests in the repo | ||
- E2E tests in the repo (would need to add additional managed clusters for this, potentially in a separate workflow, | ||
though ideally alongside the existing tests) | ||
- Policy integration tests in the framework (would need to add additional managed clusters for this, potentially in a | ||
separate workflow, though ideally alongside the existing tests) | ||
|
||
## Drawbacks / Alternatives | ||
|
||
The maturity of the `Placement` library could be brought into question. This enhancement could hold at Phase 2 until we | ||
ensure that we have a robust test environment that can test the various configurations before moving forward. |
10 changes: 10 additions & 0 deletions
10
enhancements/sig-policy/99-policy-placement-strategy/metadata.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
title: policy-placement-strategy | ||
authors: | ||
- "@dhaiducek" | ||
reviewers: | ||
- TBD | ||
approvers: | ||
- TBD | ||
creation-date: 2023-08-03 | ||
last-updated: 2023-08-03 | ||
status: implementable |