Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Certificate Manager #135

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

katheris
Copy link
Contributor

This proposal aims to allow Strimzi users to use an external certificate manager, specifically cert-manager, to manage certificates.

Related to strimzi/strimzi-kafka-operator#929

Signed-off-by: Katherine Stanley <[email protected]>

The proposal makes a few assumptions:
* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with.
* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this is because we want to keep for self-signed certs our current way and not to add another option that will mostly add just support burden?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because there are lots of different issuers that work with cert-manager. So rather than Strimzi having to actively support all the different types, I've proposed that the user creates the Issuer or ClusterIssuer and handles supplying a Secret with the trusted certificates for the issuer they have chosen. That way Strimzi can work with any cert-manager issuer integrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to provide any guidelines on conventions in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to mention in in the docs properly as users could be confused from different projects where integration of CM works without creating any Issuer (afaiu operator creates self-sign Issuer when it is not created by users).


If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Strimzi will wait during the reconciliation loop for the `Certificate` status to indicate that the certificate has been issued before continuing.
When issuing cluster certificates (e.g for Kafka etc), once the certificate has been issued, Strimzi will annotate the cert-manager provided Secret with the `strimzi.io/server-cert-hash` annotation with the value being the hash of the certificate in the Secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issuing cluster certificates (e.g for Kafka etc) - I wonder if it would be useful to include the secret names for these certificates as an example.

Copy link
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal Kate, it looks good to me.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some nits. But the main thing missing seems to be how you plan to roll out the trust to the new CAs. There is no reference to strimzi.io/ca-key-generation or to rotation of private keys, so it is not clear how do you expect to handle it.

Although it is nice that it can manage certificates, it would be beneficial if the certificates could be managed by a dedicated certificate manager, such as [cert-manager](https://cert-manager.io/).
This is a feature that is often requested, especially because many organizations have specific compliance requirements with regard to certificates, for example:
* Requiring that CA private keys are not shared.
* Requiring that self-signed certificates cannot be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this really helps because the CA will be anyway bootstrapped as self-signed as it is today in most cases, and there is not much we can do about it.

Comment on lines 83 to 84
3. Create a `Secret` containing the CAs for Strimzi to trust.
Users can optionally use [trust-manager](https://cert-manager.io/docs/trust/trust-manager/) to create this Secret, but they are responsible for installing trust-manager, creating the `Bundle` CR and annotating the resulting Secret with the Strimzi cert annotation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should instead (or next to this) describe the expectation of how the Secret should look like? IIRC, trust-manager creates a Secret will all CAs bundled into a single file? Is that supported / expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this needs some clarification. Maybe a sequence diagram with the PKI generation process including user, CO, trust-manager and cert-manager would also help.

Comment on lines 109 to 111
For cluster certificates (e.g. for Kafka etc), Strimzi will track and handle these changes using the `strimzi.io/server-cert-hash` annotation.
During the reconciliation loop, even if all cluster end-entity certificates have been issued, Strimzi will patch the certificate Secrets with the correct `strimzi.io/server-cert-hash` annotation.
The value of this annotation can then be compared with the value on the pods to determine whether the pods need to be restarted to pick up a new Secret.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the need to annotate the Secrets here. Normally, during the reconciliation:

  • You take the hash of the certiicate
  • Use the Hash to annotate the Pod in the Deployment / StrimziPodSet
  • Either Kubernetes or Strimzi takes care of rolling the pod based on the Pod annotations being different


### Issuing certificates

If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be very very useful to have this implementation behind an interface and support a mechanism for loading alternative implementations for other external certificate managers. This would allow users to integrate with other external certificate managers

certificateExpirationPolicy: <renew-certificate|replace-key>
certificateIssuer:
type: <internal|cert-manager.io> # (1)
issuerRef: # (2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issuerRef element looks to be cert-manager specific rather than something relevant to any external cert manager which could become an issue when supporting other external managers (as mentioned above).
Perhaps the certificateIssuer should instead have a certManager specific sub-element, and add different similar elements in the future that are specific to other certificate managers, i.e. something like:

clusterCa:
    certificateIssuer:
        certManager:
            issuerRef:
                name: <string>
                kind: <Issuer|ClusterIssuer>
                group: <string> # cert-manager.io by default
        someOtherManager: <-- future addition -->
            managerSpecificConfig:
                ...
        oneOf:
        - properties
            certManager{}
            someOtherManager{}

Or alternatively just allow a map of values to be specified, but that would be less user friendly

Copy link
Contributor

@fvaleri fvaleri Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The property certificateIssuer.issuerRef will only be used by Strimzi if certificateIssuer.type is set to cert-manager.io.

From the above phrase it looks like the intention is to make issuerRef cert-manger specific.

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I made a few suggested changes to wording for clarity and readability. There are also a couple of questions for further clarification.


The proposal makes a few assumptions:
* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with.
* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to provide any guidelines on conventions in the docs?


## Compatibility

This feature will be optional and not disabled by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This feature will be optional and not disabled by default.
This feature will be optional and enabled by default.

* Update to copy certificates from
cert-manager Secrets, not mount them.
* Handle trust rollout more directly.
* Apply wording suggestions.
* Add diagrams.

Signed-off-by: Katherine Stanley <[email protected]>
@katheris
Copy link
Contributor Author

@scholzj @tinaselenge @fvaleri @PaulRMellor @ppatierno @Frawless Thanks for all your comments. I've pushed an update to the proposal, the main changes apart from wording tweaks are:

  • Update to copy certificates from cert-manager Secrets, not mount them.
  • Handle trust rollout more directly in the Strimzi operator.
  • Add diagrams.

I've left the User operator/clients CA part as TODO at the moment but I would appreciate any feedback on the cluster CA part of the proposal. I also do have a full diagram of a key replacement, but am still working on making it display in a way that's viewable. Let me know if that would be useful, or if the existing diagrams are clear enough.

@katheris
Copy link
Contributor Author

@MichaelMorrisEst thanks for your comments. On the extensibility of the design, I've tried to write both the CRD and the way it's implemented such that we could add other certificate management options in future. However, I wasn't planning for it to be something a user can add at deploy time. My expectation is that we would add it directly to the codebase. The reason being that it is easier to reason about what is supported and make sure it's properly tested. Is that what you were expecting when you were asking about alternative implementations?

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates Kate.

The plus of this solution is to build up on the existing rollout logic, which is proven and well tested.

New diagrams are great. I see why you didn't put them inline, but maybe we can link them at the end of related sections.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I left some more comments, mainly about the formal side and some possibly missing scenarios being described.

Comment on lines +70 to +72
publicCert: # (3)
secretName: <string>
certificate: <string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this support a list with multiple Secrets? Should it be ready to handle other forms of certificates such as ConfigMap? E.g.:

    publicCert: # (3)
      - fromSecret:
          secretName: <string>
          certificate: <string>

This would:

  • Allow to possibly load trusted certificates from more Secrets - would that be useful when migrating to a new CA/Issuer or something like that?
  • Allow to extend the type in the future to something else than Secret (we can start with Secret only today, but we will have the option)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current proposal doesn't allow for more than one certificate here. The proposed flow requires us to associate a generation id with one certificate. This field is for the latest trusted certificate, Strimzi will keep the previous certificate until it is sure everything trusts the new one. So I don't think we need to support more than one here? But could add it incase this changes in future, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think we need to support more than one here? But could add it incase this changes in future, wdyt?

Hmm 🤔. I don't know. It obviously might be useful one day in the future ... but at the same time, if you need exactly one today, it would make it harder to validate things etc. ... and all of that for something what we might never use. So not sure what is the best approach in such a case. 🤷

@PaulRMellor PaulRMellor self-requested a review January 23, 2025 09:37
* `strimzi.io/cluster-ca-key-generation` initially set to 0.
* `strimzi.io/cluster-ca-cert-generation` initially set to 0.

During a reconciliation Strimzi will check the hash of the certificate stored in the user's CA public cert Secret to see if an update is needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the user's CA public cert Secret what you called before the cert-manager Secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are two different things. I've rewritten so hopefully it makes more sense now


If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Strimzi will specify the required CN/SANS in the `Certificate` resource for the end-entity certificate.
Strimzi will specify the Secret for the certificate to be stored in as the existing name for the certificate Secret suffixed with `-cm`, for example `<CLUSTER_NAME>-cluster-operator-certs-cm`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had hard time to understand what this sentence means sorry. Maybe just a language barrier on my side :-(
Can you elaborate here or make it clearer to me please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this sentence, see if it makes more sense now

@MichaelMorrisEst
Copy link
Contributor

@MichaelMorrisEst thanks for your comments. On the extensibility of the design, I've tried to write both the CRD and the way it's implemented such that we could add other certificate management options in future. However, I wasn't planning for it to be something a user can add at deploy time. My expectation is that we would add it directly to the codebase. The reason being that it is easier to reason about what is supported and make sure it's properly tested. Is that what you were expecting when you were asking about alternative implementations?

As an open source project, it will only ever be possible to add open source certificate management options directly in the Strimzi code base. I think it would be very useful to enable other options to be integrated without code changes.
The Strimzi project would only be responsible for testing the mechanism works, it would be the responsibility of users integrating with other certificate manager tools to ensure their integration can and does fulfil the requirements of the Strimzi API

@scholzj
Copy link
Member

scholzj commented Jan 24, 2025

@MichaelMorrisEst thanks for your comments. On the extensibility of the design, I've tried to write both the CRD and the way it's implemented such that we could add other certificate management options in future. However, I wasn't planning for it to be something a user can add at deploy time. My expectation is that we would add it directly to the codebase. The reason being that it is easier to reason about what is supported and make sure it's properly tested. Is that what you were expecting when you were asking about alternative implementations?

As an open source project, it will only ever be possible to add open source certificate management options directly in the Strimzi code base. I think it would be very useful to enable other options to be integrated without code changes. The Strimzi project would only be responsible for testing the mechanism works, it would be the responsibility of users integrating with other certificate manager tools to ensure their integration can and does fulfil the requirements of the Strimzi API

@MichaelMorrisEst I think open-source project is one aspect to consider. The other one is that it needs to have enough demand to justify the effort and maintenance. So, I think the extensibility makes sense.

The way that could be done would be to add some additional type custom where you specify the class. I think the main challenge is to create a (sufficiently) stable interface that can be used to implement the plugins. I wonder if that is easier to design now or whether implementation of this proposal would help to define such interface as we will have two implementations to compare and base the pluggable interface on.

I guess there are some dirty alternatives ...

  • Integrate the other tools through Cert Manager itself which has its own pluggable interface as far as I understood. That might give a more versatile solution that is usable outside of Strimzi but obviously it would require to manage Cert Manager itself.
  • Hook a custom operator on the Cert Manager custom resources and simply provide your certificates based on the Cert Manager Certificate resource through a custom tool.

These seem a bit more hacky. But I think they would provide a stable interface.

The whole problem moves to entirely different level when you would need to change the certificate handling completely including for example not storing them in Kubernetes Secrets (but loading them from something like Vault). As that would require the plugabiity not only in the operator code but also Kafka (Config Providers? Shell scripts?), Cruise Control (Shell scripts) or Kafka Exporter (Shell scripts) and so on. I'm not sure we would ever be able to get there - but who knows?

@MichaelMorrisEst
Copy link
Contributor

@MichaelMorrisEst thanks for your comments. On the extensibility of the design, I've tried to write both the CRD and the way it's implemented such that we could add other certificate management options in future. However, I wasn't planning for it to be something a user can add at deploy time. My expectation is that we would add it directly to the codebase. The reason being that it is easier to reason about what is supported and make sure it's properly tested. Is that what you were expecting when you were asking about alternative implementations?

As an open source project, it will only ever be possible to add open source certificate management options directly in the Strimzi code base. I think it would be very useful to enable other options to be integrated without code changes. The Strimzi project would only be responsible for testing the mechanism works, it would be the responsibility of users integrating with other certificate manager tools to ensure their integration can and does fulfil the requirements of the Strimzi API

@MichaelMorrisEst I think open-source project is one aspect to consider. The other one is that it needs to have enough demand to justify the effort and maintenance. So, I think the extensibility makes sense.

The way that could be done would be to add some additional type custom where you specify the class. I think the main challenge is to create a (sufficiently) stable interface that can be used to implement the plugins. I wonder if that is easier to design now or whether implementation of this proposal would help to define such interface as we will have two implementations to compare and base the pluggable interface on.

I guess there are some dirty alternatives ...

  • Integrate the other tools through Cert Manager itself which has its own pluggable interface as far as I understood. That might give a more versatile solution that is usable outside of Strimzi but obviously it would require to manage Cert Manager itself.
  • Hook a custom operator on the Cert Manager custom resources and simply provide your certificates based on the Cert Manager Certificate resource through a custom tool.

These seem a bit more hacky. But I think they would provide a stable interface.

The whole problem moves to entirely different level when you would need to change the certificate handling completely including for example not storing them in Kubernetes Secrets (but loading them from something like Vault). As that would require the plugabiity not only in the operator code but also Kafka (Config Providers? Shell scripts?), Cruise Control (Shell scripts) or Kafka Exporter (Shell scripts) and so on. I'm not sure we would ever be able to get there - but who knows?

@scholzj
I think implementing for the two implementations should give a good starting point for the interface and something that could be used as a basis to experiment with other implementations. Perhaps if the interface is created as part of this proposal it could be considered preliminary (behind a feature gate maybe?) and improved as necessary until we reach a point of stability.

Of the other suggestions the dependency on Cert Manager for the first suggestion limits its applicability, the second (custom operator for Cert Manager custom resources) looks more workable but it does seem a bit hacky

@katheris
Copy link
Contributor Author

@ppatierno @scholzj I've pushed some updates and added the Clients CA/User Operator section

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more nits. mostly nits apart from some things around the User OPerator.

* Users can [install and use their own CA certificate and private keys](https://strimzi.io/docs/operators/latest/deploying#installing-your-own-ca-certificates-str), instead of using the defaults generated by the Cluster Operator.
When using this option, both the CA certificate and private key must be provided, and Strimzi still issues the end-entity (EE) certificates that are presented by the components.
* Provide a Clients CA public cert, a dummy value for the private key and issue their own user certificates out-of-band.
If using this approach, users can also use `KafkaUser` CRs with `spec.type` set to `tls-external` with the User Operator managing ACLs and quotas.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If using this approach, users can also use `KafkaUser` CRs with `spec.type` set to `tls-external` with the User Operator managing ACLs and quotas.
If using this approach, users can also use `KafkaUser` CRs with `spec.authentication.type` set to `tls-external` with the User Operator managing ACLs and quotas.

Comment on lines +70 to +72
publicCert: # (3)
secretName: <string>
certificate: <string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think we need to support more than one here? But could add it incase this changes in future, wdyt?

Hmm 🤔. I don't know. It obviously might be useful one day in the future ... but at the same time, if you need exactly one today, it would make it harder to validate things etc. ... and all of that for something what we might never use. So not sure what is the best approach in such a case. 🤷

Comment on lines 152 to 162
During the next reconciliation Strimzi will:
1. Copy over the new CA public cert (keeping the old one as described previously) and update the hash and generation annotations.
2. Roll the Kafka pods to trust the new CA cert, incrementing the `strimzi.io/cluster-ca-key-generation` annotation on the pods.
3. Copy over the new Kafka certificates, updating the `strimzi.io/cluster-ca-cert-generation` annotations on the Secrets.
4. Roll the Kafka pods to use the new Kafka certificates, updating the `strimzi.io/cluster-ca-cert-generation` annotations on the Pods.
5. Then on the next reconciliation, since the Kafka pods now have correct cert and key generation, copy over the new operator certificate.
6. Since the Kafka pods now have correct cert generation, remove the old CA public cert from the `<CLUSTER_NAME>-cluster-ca-cert` Secret

> ![Renewing the cluster CA public cert](./images/087-cert-renewals.png)
>
> Fig 2: Existing and proposed workflow when the user provides a new cluster CA public cert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you cover at which stage will this be done? Will that be done somewhere at the beginning of today's CaReconciler (and thus the first reocnciliation after unpausing will immediately start taking the steps to replace the CA)? Or soemwhere later in the reconciliation and essentially just prepare things for the next reconciliation?

>
> Fig 1: Proposed workflow when cert-manager issues new component end-entity certificates

#### Handling CA key replacements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any difference between CA key replacement here and simple CA public key renewal? Given the later seems not to be covered in any other section, I assume all public key renewals are handled in the same way as key replacements (as we do with custom CAs today)? Assuming that is the case, it would be worth clarifying it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment matches pretty well with my concern about increasing the key generation on cert renewal when we don't really know if the key is changed or not.

Comment on lines 173 to 179
When a `KafkaUser` with `spec.authentication.type` set to `tls` is creates Strimzi will create a `Certificate` custom resource.
Strimzi will specify the required CN/SANS in the `Certificate` resource for the user certificate.
Strimzi will specify the Secret for the certificate to be stored in as the existing name for the certificate Secret suffixed with `-cm`, for example `<USER_NAME>-cm`.
Strimzi will request the certificate in both PEM and PKCS12 format.
Strimzi will wait for the usual operation timeout during the reconciliation loop for the `Certificate` status to indicate that the certificate has been issued before continuing.
Once the certificate has been issued, Strimzi will copy the certificate across from the cert-manager provided Secret into its own existing Secret.
Strimzi will also copy across the ClientsCA public certificate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to do the copying here? Or do we want the user to get it from the CM or (whoever actually produces it) directly? For the server certificates and CAs, we depend on them so we want to copy them out. The user certificate secrets actually exist only for the users. We do not care about them much otherwise. So I wonder if keeping them twice makes sense?

But maybe we still need it to track the generations etc.

* Update the API to use strimzi.io as the default type.
* Update the user operator section to
use the cert-manager Secret directly, rather
than creating a copy.

Signed-off-by: Katherine Stanley <[email protected]>
- `STRIMZI_CM_ISSUER_KIND`
- `STRIMZI_CM_ISSUER_GROUP`

These will be set by the Cluster operator when the User operator is deployed as part of the Entity operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These will be set by the Cluster operator when the User operator is deployed as part of the Entity operator.
These will be set by the Cluster operator when the User operator is deployed as part of the Entity operator and when Clients CA is set to `type: cert-manager.io`.

When a certificate is renewed cert-manager will update the related Secret.
Since the user's clients are directly using the cert-manager created Secret, Strimzi will take no action.

#### Handling CA cert renewals and key replacements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a CO part? Or UO part? I think CO, but it is not completely clear from the structure / test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on this as well :-/

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @katheris.

Comment on lines +78 to +82
The option `cert-manager.io` will only be valid if `generateCertificateAuthority` is set to `false`.
2. The property `certManagerIssuerRef` will only be used by Strimzi if `type` is set to `cert-manager.io`.
The `name`, `kind`, and `group` properties will be copied over into the `Certificate` custom resource Strimzi creates.
3. The property `publicCert` will only be used by Strimzi if `type` is set to `cert-manager.io`.
The `secretName` and `certificate` properties will be used to locate the CA public certificate that must be trusted by Strimzi components in order to trust the end-entity certificates that cert-manager issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leverage the new CEL validation feature to do these checks at the API level. Not sure you want to mention this implementation detail, but I wanted to leave a note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thanks :)

#### Issuing end-entity certificates

When Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Strimzi will specify the required CN/SANS in the `Certificate` resource for the end-entity certificate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Strimzi will specify the required CN/SANS in the `Certificate` resource for the end-entity certificate.
Strimzi will specify the required CN/SANs in the `Certificate` resource for the end-entity certificate.


These will be set by the Cluster operator when the User operator is deployed as part of the Entity operator.

When a `KafkaUser` with `spec.authentication.type` set to `tls` is creates Strimzi will create a `Certificate` custom resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a `KafkaUser` with `spec.authentication.type` set to `tls` is creates Strimzi will create a `Certificate` custom resource.
When a `KafkaUser` with `spec.authentication.type` set to `tls` is created Strimzi will create a `Certificate` custom resource.

These will be set by the Cluster operator when the User operator is deployed as part of the Entity operator.

When a `KafkaUser` with `spec.authentication.type` set to `tls` is creates Strimzi will create a `Certificate` custom resource.
Strimzi will specify the required CN/SANS in the `Certificate` resource for the user certificate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Strimzi will specify the required CN/SANS in the `Certificate` resource for the user certificate.
Strimzi will specify the required CN/SANs in the `Certificate` resource for the user certificate.

* `strimzi.io/cluster-ca-key-generation` on Kafka pods to indicate the CA generation trusted by that pod
* `strimzi.io/cluster-ca-cert-generation` on Secrets containing certs to indicate the CA generation that signed those certs

The `strimzi.io/cluster-ca-cert-generation` annotation is also used on Kafka pods to indicate the CA generation that signed the certs it is currently presenting (i.e. the CA generation currently in use).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the descriptions about cluster-ca-key-generation and cluster-ca-cert-generation annotations on pods slightly confusing and maybe in the other way around. We are using the word "trusted" when describing a key and the word "signed" when describing the cert, while you usually "sign" a EE cert with a key and "trust" a EE cert because you can validate it with the cert. Maybe we can do better to avoid this confusion here?
So to me ... the cluster-ca-key-generation is the generation of the private key which was used to sign the EE cert presented by the pod. The cluster-ca-cert-generation is the generation of the corresponding public key/cert trusted by other component which is used to validate the received EE cert.
Of course, these two generations always increase together with the same value if you renew a key (more times) when starting from generation 0 but they can diverge (with cert generation being higher than the key generation) when you start to renew the public key/cert (just to have a new expiration date) but keeping the same key (so the EE certs don't need to be re-signed).


The `strimzi.io/cluster-ca-cert-generation` annotation is also used on Kafka pods to indicate the CA generation that signed the certs it is currently presenting (i.e. the CA generation currently in use).

When using cert-manager for issuing certificates Strimzi will continue to use these generations in the same way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we summarize in a 2-3 bullet points what "the same way" means here?


#### Handling Cluster CA trust rollout

When a new Kafka cluster is created Strimzi will copy the CA public cert from the Secret identified in `clusterCa.publicCert` to the `<CLUSTER_NAME>-cluster-ca-cert` Secret.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something like "by using the ca.crt field". Just to match what you are going to say later on line 114, where you are renaming such ca.crt field into ca-YYYY-MM-DDTHH-MM-SSZ.crt.


During a reconciliation Strimzi will check the hash of the certificate stored in `clusterCa.publicCert` Secret to see if an update is needed.
If the certificate has changed Strimzi will rename the existing cert in the `<CLUSTER_NAME>-cluster-ca-cert` Secret to `ca-YYYY-MM-DDTHH-MM-SSZ.crt` and copy over the new certificate.
It will also update the `strimzi.io/ca-cert-hash` and increment the `strimzi.io/ca-key-generation` and `strimzi.io/ca-cert-generation` annotations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genuine question because I am not sure. Are we sure that a change in the hash of the certificate reflects also the fact that a new key was used? I am thinking about the case when the cert is renewed but the key is the same. So you will get a different hash but then increasing the cert generation, which is ok, but also the key generation which is wrong. Wdyt?

Strimzi will wait for the usual operation timeout during the reconciliation loop for the `Certificate` status to indicate that the certificate has been issued before continuing.
When issuing cluster certificates (e.g for each Kafka pod etc), once the certificate has been issued, Strimzi will copy the certificate across from the cert-manager provided Secret into its own existing Secret.
Strimzi will annotate the Secret it manages with:
* `strimzi.io/server-cert-hash` annotation with the value being the hash of the certificate in the cert-manager Secret.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why "the cert-manager Secret"? Didn't the operator copy over the cert into its own Secret related to the issued EE cert. Isn't it better referring to Strimzi stuff only as much as possible?

>
> Fig 1: Proposed workflow when cert-manager issues new component end-entity certificates

#### Handling CA key replacements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment matches pretty well with my concern about increasing the key generation on cert renewal when we don't really know if the key is changed or not.

5. Then on the next reconciliation, since the Kafka pods now have correct cert and key generation, copy over the new operator certificate.
6. Since the Kafka pods now have correct cert generation, remove the old CA public cert from the `<CLUSTER_NAME>-cluster-ca-cert` Secret

> ![Renewing the cluster CA public cert](./images/087-cert-renewals.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment ... in the picture I see you are increasing both key and cert generation when there is a new public cert but how do we know that the key wasn't changed and it was just a renewal of the cert (new expiration date) but by using the same key? In this case cert generation has to be increased, but not the key generation. I see you have the first step "A new CA key has been generated" but how does the operator know about it?

When a certificate is renewed cert-manager will update the related Secret.
Since the user's clients are directly using the cert-manager created Secret, Strimzi will take no action.

#### Handling CA cert renewals and key replacements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on this as well :-/


## Compatibility

This feature will be optional and disabled by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we are going to put this feature behind a feature gate? If yes, should we state it clearly and mentioning it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding it won't be behind FG, but the default configuration will use Strimzi CA generation and users will need to explicitly configure cert-manager to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I mislead what "optional and disabled by default" means here, maybe we should highlight that if nothing is set, the "type" will be implicitly considered as the strimzi.io one?

Copy link
Member

@Frawless Frawless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not security SME, but the feature proposal sounds good to me. +1

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments, Kate.
I found the addition of the diagrams useful in showing the new flows.

## Current situation

There are two different categories of certificates that Strimzi handles:
* The term "cluster" refers to certificates that are issued for the Strimzi components:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The term "cluster" refers to certificates that are issued for the Strimzi components:
* The "cluster" category refers to certificates that are issued for the Strimzi components:

* Cluster, User and Topic operators
* Cruise Control
* Kafka Exporter
* The term "clients" refers to certificates that are issued for user applications using the User Operator, or through another external mechanism chosen by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The term "clients" refers to certificates that are issued for user applications using the User Operator, or through another external mechanism chosen by the user.
* The "clients" category refers to certificates that are issued for user applications using the User Operator, or through another external mechanism chosen by the user.

In addition to Strimzi fully managing the certificates as described above, there are options for users to partially manage the certificates:
* Users can [install and use their own CA certificate and private keys](https://strimzi.io/docs/operators/latest/deploying#installing-your-own-ca-certificates-str), instead of using the defaults generated by the Cluster Operator.
When using this option, both the CA certificate and private key must be provided, and Strimzi still issues the end-entity (EE) certificates that are presented by the components.
* Provide a Clients CA public cert, a dummy value for the private key and issue their own user certificates out-of-band.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Provide a Clients CA public cert, a dummy value for the private key and issue their own user certificates out-of-band.
* Users can provide a Clients CA public certificate, a placeholder value for the private key, and issue their own user certificates independently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion in case "dummy" and "out-of-band" not obvious to non-native speakers.

@im-konge im-konge removed their request for review February 18, 2025 09:00
@k-wall
Copy link
Contributor

k-wall commented Feb 18, 2025

lgtm, nice work @katheris

* Address review comments from PaulRMellor, ppatierno
and fvaleri.

* Update Cluster CA flow so user does not need to
pause Strimzi when replacing the CA private key.
Instead use cert path validation to identify when
a key has been replaced and when new end-entity
certificates are not trusted yet.

Signed-off-by: Katherine Stanley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants