-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add the standalone hub templates design
Relates: https://issues.redhat.com/browse/ACM-15053 Signed-off-by: mprahl <[email protected]>
- Loading branch information
Showing
2 changed files
with
217 additions
and
0 deletions.
There are no files selected for viewing
207 changes: 207 additions & 0 deletions
207
enhancements/sig-policy/134-standalone-hub-templates/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
# Standalone Hub Templates | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in | ||
[website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/) | ||
|
||
## Summary | ||
|
||
To support Open Cluster Management (OCM) hub templates without the policy framework, this feature provides an additional | ||
`goverance-standalone-hub-templating` addon which generates a service account per cluster on the hub to be leveraged by | ||
the Configuration Policy controller for hub template resolution. The OCM hub administrator will be responsible for | ||
granting `list` and `watch` permissions to the resources needed by the hub templates. A sample `ConfigurationPolicy` can | ||
be a way to automate this for a cluster set. | ||
|
||
## Motivation | ||
|
||
When an Open Cluster Management (OCM) policy user deploys their policies with external tools such as Argo CD, the hub | ||
templating functionality is lost. This is often a critical feature for large scale deployments and prevents users from | ||
choosing an external tool if they require an alternative to the OCM policy framework. | ||
|
||
### Goals | ||
|
||
1. Support hub templates for `ConfigurationPolicy` and `OperatorPolicy` when deployed through an external tool such as | ||
Argo CD. | ||
1. Hub templates should resolve when referenced objects are updated on the hub. | ||
1. Policies can still be evaluated if the managed cluster is disconnected from the hub. | ||
1. Permissions to the hub are per managed cluster. | ||
1. Provide a hub only `ConfigurationPolicy` to provide permissions to cluster sets. | ||
1. It must work without the rest of the policy framework on the managed cluster. | ||
|
||
### Non-Goals | ||
|
||
1. `CertificatePolicy` and Gatekeeper constraints will not be directly supported. It can be wrapped in a | ||
`ConfigurationPolicy` if hub templates are needed for those. | ||
1. Updating a `ConfigurationPolicy` definition with hub templates while disconnected from the hub is not explicitly | ||
supported. This will almost always work based on the proposed design as long as there aren't new hub template API | ||
queries, but if the update happened while the hub and Configuration Policy controller were offline, it will not work. | ||
|
||
## Proposal | ||
|
||
### Design | ||
|
||
#### New goverance-standalone-hub-templating Addon | ||
|
||
Currently, hub templates are resolved by the Governance Policy Propagator as part of the replicated `Policy` creation. | ||
In standalone mode, the Governance Policy Propagator and the Governance Policy Framework are not part of the equation. | ||
Because of this, the simplest and most reliable approach is to add hub template support in the Configuration Policy | ||
controller. | ||
|
||
A way to achieve this is to create an additional addon that is **disabled** by default called | ||
`goverance-standalone-hub-templating`. Enabling this addon will cause the following: | ||
|
||
- The addon framework creates a service account on the hub for each managed cluster. There will be no permissions | ||
assigned to it. | ||
- The addon framework will also create a `Secret` in the `open-cluster-management-agent-addon` namespace with the | ||
service account kubeconfig that will be leveraged by the Configuration Policy controller. | ||
- The Governance Policy Addon controller should configure the `config-policy-controller` to mount the generated | ||
kubeconfig when the `goverance-standalone-hub-templating` addon is enabled and pass a flag to enable the standalone | ||
hub templating mode. | ||
|
||
#### Configuration Policy changes | ||
|
||
When the Configuration Policy controller encounters a hub template when standalone hub templating mode is disabled, it | ||
should mark the policy as `NonCompliant` with a message such as | ||
`The goverance-standalone-hub-templating addon must be enabled to resolve hub templates from the cluster`. The | ||
`ConfigurationPolicy` objects created on the managed cluster through the policy framework will never have hub templates | ||
since the policy framework marks these policies as `NonCompliant` assuming the hub templates failed to resolve on the | ||
hub. With this in mind, the user's intentions are safe to assume when a hub template is encountered. | ||
|
||
When the Configuration Policy controller has standalone hub templating mode enabled, it should instantiate a separate | ||
`TemplateResolver` with the `goverance-standalone-hub-templating` hub service account and resolve hub templates (i.e. | ||
`{{hub ... hub}}`) prior to managed cluster templates (i.e. `{{ ... }}`). By default, this `TemplateResolver` will have | ||
no permission on the hub cluster. The hub administrator must grant `list` and `watch` permissions on the objects | ||
accessed by hub templates. The error messages for lack of permissions are clear due to the work done in the "Expand Hub | ||
Template Access on Policies" feature. | ||
|
||
If the changes stopped here, the case where the managed cluster is always connected to the hub would work well. Short | ||
disconnects would be okay since objects referenced by the hub templates are cached. Once the Configuration Policy | ||
controller restarts and thus the cache is cleared, the policy could no longer be evaluated due to hub template | ||
resolution failing. | ||
|
||
The proposed solution is to add a new annotation to the `ConfigurationPolicy` of | ||
`policy.open-cluster-management.io/last-resolved-policy`. The Configuration Policy controller will record a stripped | ||
down `ConfigurationPolicy` after hub templates are resolved. This must include at least the `generation`, `uid`, and the | ||
`spec`. In the event the Configuration Policy controller restarts and fails to resolve hub templates due to a | ||
"connection refused" error, the Configuration Policy controller wil fallback to the value in the | ||
`policy.open-cluster-management.io/last-resolved-policy` annotation value if the `generation` and `uid` matches the | ||
current `ConfigurationPolicy`. The `uid` detects if the `ConfigurationPolicy` was recreated since it was last resolved | ||
and the `generation` detects if the `ConfigurationPolicy` was updated since it was last resolved. If either changed, the | ||
policy resolution will fail. In other words, modifying a `ConfigurationPolicy` with hub templates while disconnected | ||
from the hub will not be supported. It may work depending on what's in the `TemplateResolver` cache but this feature | ||
will not guarantee any resilience to this as noted in the "Non-Goals" section. | ||
|
||
One issue remains is if the user's hub templates are copying secrets from the hub. Storing these in the | ||
`policy.open-cluster-management.io/last-resolved-policy` annotation is problematic since custom resources are not | ||
encrypted in etcd. The path forward is to encrypt content like hub templates does today as part of generating the | ||
replicated policy. Since the Governance Policy Propagator is | ||
[responsibile](https://github.com/stolostron/governance-policy-propagator/blob/be8cc1cd68b0b5f499bbc4c5e9b6906fde5c7d5c/controllers/propagator/propagation.go#L341-L373) | ||
for generating the AES 256 encryption key and the initialization vector (IV) when it detects encryption is used in a hub | ||
template, this won't be available in standalone mode. | ||
|
||
To not duplicate the creation and automatic rotation of the per cluster encryption key for standalone mode and the | ||
policy framework, let's modify the existing `policy-encryption-keys` controller on the Governance Policy Propagator to | ||
watch for the `config-policy-controller-hub-templating` `ManagedClusterAddon` objects and have a controller-runtime | ||
event handler map to the `policy-encryption-keys` `Secret` in the same namespace as the `ManagedClusterAddon`. | ||
Additionally, the existing `policy-encryption-keys` controller should be modified to create the `Secret` if it's missing | ||
rather than just ignore the reconcile. The rotation logic should work like it does today without any additional changes. | ||
|
||
The Configuration Policy controller will be responsible for setting the | ||
`policy.open-cluster-management.io/encryption-iv` on the `ConfigurationPolicy` instead of the Governance Policy | ||
Propagator when in standalone hub templates mode. This is just generating 16 random bytes so it's not a lot of code | ||
duplication and is acceptable. | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
|
||
As an Argo CD user, I'd like to leverage `ConfigurationPolicy` to securely copy secrets from the Open Cluster Management | ||
hub to the managed cluster. | ||
|
||
#### Story 2 | ||
|
||
As a `ConfigurationPolicy` user, I'd like to leverage a centralized `ConfigMap` on the hub for dynamic configuration | ||
parameters for each managed cluster. | ||
|
||
#### Story 3 | ||
|
||
As a hub template user without the policy framework, I'd like to have an easy way to grant permissions to groups of | ||
clusters for hub templates. This will be a sample ConfigurationPolicy to grant a `Role` to a cluster set. | ||
|
||
#### Story 4 | ||
|
||
As a policy framework user, I'd like to disable standalone hub templates support since I want to decreate the attack | ||
surface of the Open Cluster Management hub. | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
#### Event Driven Aspect | ||
|
||
Lets continue using the [go-template-utils](https://github.com/stolostron/go-template-utils) library which uses the | ||
[kubernetes-dependency-watches](https://github.com/stolostron/kubernetes-dependency-watches) library under the hood for | ||
resolving this style of hub templates. The difference is that the API watches are for a separate cluster than the | ||
cluster the Configuration Policy controller is running on. In the case that the managed cluster is disconnected from the | ||
hub, the [kubernetes-dependency-watches](https://github.com/stolostron/kubernetes-dependency-watches) library's use of | ||
`RetryWatcher` causes it to continuously retry without a backoff. Not only does this spam the logs, it can keep the CPU | ||
unnecessarily busy. | ||
|
||
Proposed solution: | ||
|
||
1. Modify | ||
[watchLatest](https://github.com/stolostron/kubernetes-dependency-watches/blob/85c50006641c8c5a951bdd35bea3d48708c7e9a5/client/client.go#L725-L791) | ||
to have an exponential backoff. | ||
1. Lower the log level for connection refused errors in client-go in the | ||
[RetryWatcher doReceive](https://github.com/kubernetes/kubernetes/blob/5864a4677267e6adeae276ad85882a8714d69d9d/staging/src/k8s.io/client-go/tools/watch/retrywatcher.go#L123-L134) | ||
method. | ||
|
||
### Risks and Mitigation | ||
|
||
- Increased hub attack surface due to an additional service account. The addon is disabled by default and the user must | ||
explicitly grant permissions to the service account. | ||
|
||
### Open Questions [optional] | ||
|
||
- Instead of the `policy.open-cluster-management.io/last-resolved-policy` annotation, should this be a | ||
`ConfigurationPolicy` status field? The thought behind the annotation option is that it's displayed in a condensed way | ||
by `kubectl`, keeps the status relatively easy to read, and is similar to the concept of the | ||
`kubectl.kubernetes.io/last-applied-configuration` annotation. | ||
|
||
### Test Plan | ||
|
||
**Note:** _Section not required until targeted at a release._ | ||
|
||
### Graduation Criteria | ||
|
||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
A downgrade would cause the `ConfigurationPolicy` to enforce unresolved hub templates as the actual values. The user | ||
must remove these policies before downgrading. | ||
|
||
### Version Skew Strategy | ||
|
||
## Implementation History | ||
|
||
N/A | ||
|
||
## Drawbacks | ||
|
||
- Requires a lot of effort by the user to configure the permissions in a large environment. A provided | ||
`ConfigurationPolicy` example should help. | ||
|
||
## Alternatives | ||
|
||
- We could consider reusing the Configuration Policy hub service account rather than a separate addon as it only has | ||
access to manage leases on the hub. This would greatly reduce the complexity of the setup, but it would violate the | ||
least privilege principle and the risk could be greater if the Configuration Policy controller requires additional | ||
permissions on the hub in the future. | ||
|
||
## Infrastructure Needed [optional] | ||
|
||
N/A |
10 changes: 10 additions & 0 deletions
10
enhancements/sig-policy/134-standalone-hub-templates/metadata.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
title: standalone-hub-templates | ||
authors: | ||
- "@mprahl" | ||
reviewers: | ||
- TBD | ||
approvers: | ||
- TBD | ||
creation-date: 2024-11-15 | ||
last-updated: 2024-11-15 | ||
status: implementable |