Skip to content

Commit

Permalink
Finalize SEP with latest state
Browse files Browse the repository at this point in the history
Signed-off-by: Daniel Grimm <[email protected]>
  • Loading branch information
dgn committed Nov 6, 2024
1 parent 120bb6f commit af586a9
Showing 1 changed file with 84 additions and 17 deletions.
101 changes: 84 additions & 17 deletions enhancements/SEP2-revision-tags.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,52 +18,119 @@ istioctl tag set default --revision 1-21-1
```

## Goals
* Provide revision tag support in sail-operator APIs so users don't have to use istioctl for basic revision tag operations
* Provide revision tag support in Sail Operator APIs so users don't have to use istioctl for basic revision tag operations

## Non-goals
* Compatibility with manual revision tag creation using istioctl. There might be a way to disable the operator functionality to avoid conflicts when creating revision tags manually, but that's it - you either do it yourself or let the operator do it

## Design

### User Stories
1. As a user of sail-operator's RevisionBased update strategy, I want to be able to use the `istio-injection=enabled` label on my namespaces.
1. As a user of Sail Operator's RevisionBased update strategy, I want to be able to use the `istio-injection=enabled` label on my namespaces.
1. As a platform engineer, I want my application teams to be able to use a fixed label for proxy injection without having to know which version of Istio is running in the cluster, so that I can perform upgrades in the background without the application teams having to make changes to use the new version.

### API Changes
We will add a new CRD called `IstioRevisionTag` that includes a `spec.targetRef` field. In this field, users can specify the `IstioRevision` or `Istio` resource that the `IstioRevisionTag` references.
We will add a new CRD called `IstioRevisionTag` that consistly most of a `spec.targetRef` field and a `status` subresource.

#### IstioRevisionTag resource
Here's an example YAML for the new resource:
```yaml
apiGroup: sailoperator.io/v1alpha1
kind: IstioRevisionTag
metadata:
name: default
spec:
targetRef:
kind: Istio # can also be IstioRevision
name: default
status:
observedGeneration: 1
conditions: []
state: Healthy
istiodNamespace: istio-system
istioRevision: default-v1.24.0
```
In the `spec.targetRef` field, users can specify the `IstioRevision` or `Istio` resource that the `IstioRevisionTag` references. In case of referencing a `IstioRevision` resource, the created tag will point to the exact Istio control plane revision that is represented by the `IstioRevision` resource and any update of the tag will have to be made manually by changing the `spec.targetRef` to point to another `IstioRevision`. As long as a `IstioRevisionTag` exists that references a `IstioRevision`, that `IstioRevision` will be considered "in-use" by the Sail Operator, preventing its automatic deletion during a control plane update (see details below under [InUse detection](#inuse-detection))

If the `spec.targetRef` is used to reference an `Istio` resource, the Sail Operator will automatically update the revision tag when a new `IstioRevision` is created as part of a version update of the `Istio` resource. In this case, the `IstioRevisionTag` resource behaves like a floating tag, always referencing the active `IstioRevision` of an `Istio` resource. When it comes to InUse detection, the existence of a floating tag will also cause the active `IstioRevision` of the `Istio` resource to be considered InUse. However, it will not prevent automatic deletion, because the reference is updated immediately when the active revision changes during an update.

#### IstioRevisionTag Status

The `status.state` field gives a quick hint as to whether a tag has been reconciled and is InUse ("Healthy") or if there are any problems with it.

The `status.istiodNamespace` and `status.istioRevision` fields are used by the Sail Operator controllers to store information about the Istio control plane that is referenced by this `IstioRevisionTag`. This is especially useful when it is referencing an `Istio` resource, to see which underlying `IstioRevision` is considered referenced by the operator.

Possible conditions for `status.conditions` are:

##### Reconciled
`true` when the tag's helm chart has been installed successfully. Possible error reasons are:
__RefNotFound__: the resource referenced by the `spec.targetRef` field was not found
__NameAlreadyExists__: there already is an `IstioRevision` with this name
__ReconcileError__: there was an error installing the chart

##### InUse
`true` when the `IstioRevisionTag` is referenced by a namespace or pod. Possible reasons when `false` are:
__NotReferencedByAnything__: no namespace or pod is referencing the tag
__UsageCheckFailed__: there was a problem during InUse detection

#### InUse detection

In case of referencing a `IstioRevision` resource, the created tag will point to the exact Istio control plane revision that is represented by the `IstioRevision` resource and any update of the tag will have to be made manually by changing the `spec.targetRef` to point to another `IstioRevision`. As long as a `IstioRevisionTag` exists that references a `IstioRevision`, that `IstioRevision` will be considered "in-use" by the Sail Operator, preventing its automatic deletion during a control plane update.
An `IstioRevisionTag` is considered InUse when

If the `spec.targetRef` is used to reference an `Istio` resource, the Sail Operator will automatically update the revision tag when a new `IstioRevision` is created as part of a version update of the `Istio` resource. In this case, the `IstioRevisionTag` resource behaves like a floating tag, always referencing the latest `IstioRevision` of an `Istio` resource.
* there's a pod or namespace that explicitly references the `IstioRevisionTag` in an `istio.io/rev` label
* the name of the `IstioRevisionTag` name is 'default' and there's a pod or namespace with the `istio-injection: enabled` label

When the very first `IstioRevision` is created in a cluster from a `RevisionBased` Istio resource, the Sail Operator will create a `IstioRevisionTag` with the name `default`, referencing that `IstioRevision`. This way, the standard namespace injection label `istio-injection=enabled` will work out of the box for RevisionBased deployments (see second paragraph of the [Overview](#overview)).
Note that a pod's `istio.io/rev` annotation will not be considered as that will always have the name of the referenced `IstioRevision` rather than the name of the `IstioRevisionTag`! The labels however are added by users and reflect usage intent, ie the user will use the name of the `IstioRevisionTag`.

We will also need to remove the `values.revisionTags` field from our API, which is how the upstream charts expose this feature.
Even if the referenced `IstioRevision` of an `IstioRevisionTag` is considered InUse, that does not suffice to make the `IstioRevisionTag` considered InUse by the operator! It is considered an unused alias for an InUse `IstioRevision`.

Additionally, the introduction of `IstioRevisionTag` also adds another condition to the InUse detection of `IstioRevision`: being referenced by an `IstioRevisionTag` will now always lead to an `IstioRevision` being considered InUse! For this, it does not matter if the `IstioRevisionTag` is itself considered InUse.

For completeness' sake, here's an overview of the conditions for an `IstioRevision` to be considered InUse (new condition in bold):

* there's a pod that explicitly references the `IstioRevision` in an `istio.io/rev` annotation or label
* there's a namespace that explicitly references the `IstioRevision` in an `istio.io/rev` label
* the name of the `IstioRevision` is 'default' and there's a pod or namespace with the `istio-injection: enabled` label
* __there's an `IstioRevisionTag` referencing this `IstioRevision`__

#### Changes to existing APIs
We will need to remove the `values.revisionTags` field from our API, which is how the upstream charts expose this feature.

### Architecture
We will need to update the sail-operators mechanism to detect revisions that are being used. Today, we only look at the `istio.io/rev` label's value to check which revisions are in use. But when revision tags are used, those values will be mere aliases, so we have to improve our detection mechanism. The most correct way is probably to look at the revision annotation on the pods that is set by Istio during injection. That requires inspecting every pod in the cluster, though. Another way could be to resolve the tags - the sail-operator knows which revisions the tags point to, after all - but only if tags are exclusively managed by the sail-operator.
We will need to update the mechanism to detect revisions that are being used. Today, we only look at the `istio.io/rev` annotation's value to check which revisions are in use. But when revision tags are used, those values will point to the referenced revision instead of the tags, so we have to improve our detection mechanism. The most correct way is probably to look at the revision label on the pods and namespace that is set to configure injection.

Revision tags and revision names can be used interchangably, so they must never overlap. Therefore, we'll need a `status` on the `IstioRevisionTag` resource that can show the user an error if the name they chose is already taken by a `IstioRevision`. Another case we'll need to cover is when an `IstioRevision` is being reconciled and it would be assigned the same name as an existing tag. In this case, I suggest the `IstioRevision` wins and the tag is decommissioned, also with an error in its status. As the revision names generated for `IstioRevision` are very specific and end with the Istio control plane version, it is unlikely for this to happen, but we'll need to define the behavior for when it does.
Revision tags and revision names can be used interchangably, so they must never overlap. Therefore, we'll need a `status` on the `IstioRevisionTag` resource that can show the user an error if the name they chose is already taken by a `IstioRevision`. Another case that needs to be covered is when an `IstioRevision` is being reconciled and it would be assigned the same name as an existing tag. In this case, reconciliation of the `IstioRevision` should fail, with an error message that tells the users why this happened, ie "the name is already used by an `IstioRevisionTag`".

## Alternatives Considered

### Reuse `IstioRevision`'s type field for revision tags
We could add a `type` of `Tag` to the `IstioRevision` CRD and use that to manage tags. It would have the benefit that the user could list all revisions and tags using `kubectl get istiorevisions` and name-uniqueness would be handled by Kubernetes. The disadvantage though is that revision tags share no other fields with `IstioRevision` and it would be quite confusing for users that 99% of the CRD's fields are not to be used in this case, whereas there would be one new field that is only to be used for `IstioRevisions` with type=Tag.

Note that the `type` field has since been removed with the removal of `RemoteIstio`.

### managing tags in Istio resource
In a previous iteration of this SEP, the tags that point to an Istio control plane would have been managed in the `Istio` revision itself, in a `spec.updateStrategy.revisionTags` field. That would have meant that they are always referencing a `Istio` resource while at the same time being copied to every underlying `IstioRevision` resource.

### values.revisionTags
Istio has a `values.revisionTags` field that we even currently expose in our APIs. The problem is that we copy all values from the `Istio` resource to every `IstioRevision` and that means we would be facing duplicate revision tags when we create additional revisions in the sail-operator - so, some logic would be required to work around this problem. As it is a similar amount of effort, I prefer the explicit version of adding the field to the `Istio` CRD.
Istio has a `values.revisionTags` field that we even currently expose in our APIs. The problem is that we copy all values from the `Istio` resource to every `IstioRevision` and that means we would be facing duplicate revision tags when we create additional revisions in the Sail Operator - so, some logic would be required to work around this problem. As it is a similar amount of effort, I prefer the explicit version of adding the field to the `Istio` CRD.

### Automatic creation of default revision tag
Previously, we had this paragraph in the Architecture section:
> When the very first `IstioRevision` is created in a cluster from a `RevisionBased` Istio resource, the Sail Operator will create a `IstioRevisionTag` with the name `default`, referencing that `IstioRevision`. This way, the standard namespace injection label `istio-injection=enabled` will work out of the box for RevisionBased deployments (see second paragraph of the [Overview](#overview)).

We have since dropped this from the design as we faced some problems with this approach. Most importantly, it is very hard to detect whether the user is not creating a default IstioRevisionTag or we have simply not seen its creation event yet. We have discussed multiple possible solutions to this, among them usage of a 'virtual' revision tag that is not created on the API server but only exists in-memory, leading to the creation of the respective Kubernetes resources only. This would avoid a race between the operator and the user trying to create tags with the same name simultaneously.

Due to the complexity of the task we have moved it into a separate ticket: [#439 Create default revision tag automatically](https://github.com/istio-ecosystem/sail-operator/issues/439).

## Implementation Plan
- [ ] Improve revision usage detection to support revision tags
- [ ] Create IstioRevisionTag CRD and controller
- [ ] Implement support for referencing Istio resources
- [ ] Implement support for referencing IstioRevision resources
- [ ] Add logic to prevent duplications across revisions and tags
v1alpha1
- [ ] Initial implementation & tests (https://github.com/istio-ecosystem/sail-operator/pull/413)
- [ ] Documentation
- [ ] Integration tests

v1beta1
- [ ] [#439 Create default revision tag automatically](https://github.com/istio-ecosystem/sail-operator/issues/439)
- [ ] [#471 Support revision tags in multicluster topologies](https://github.com/istio-ecosystem/sail-operator/issues/471)

## Test Plan
We should cover this functionality in our sail-operator integration tests.
Functionality will be tested in integration tests.

0 comments on commit af586a9

Please sign in to comment.