diff --git a/docs/modules/ROOT/pages/explanations/decisions/admin-kubeconfig.adoc b/docs/modules/ROOT/pages/explanations/decisions/admin-kubeconfig.adoc new file mode 100644 index 00000000..6f6bbceb --- /dev/null +++ b/docs/modules/ROOT/pages/explanations/decisions/admin-kubeconfig.adoc @@ -0,0 +1,82 @@ += Admin kubeconfig management + +== Problem + +We currently store the kubeconfig for the `system:admin` user which is generated by the `openshift-install` program in Passbolt for emergency access to clusters. +The client certificate generated for that kubeconfig has a lifetime of 10 years. +Unfortunately, Kubernetes doesn't support revoking client certificates, see https://github.com/kubernetes/kubernetes/issues/18982[this GitHub issue (kubernetes/kubernetes#18982)]. + +We would like to have another form of emergency access to OpenShift 4 clusters. +The main reason is that having credentials with a lifetime of 10 years which can't be revoked is less than ideal. + +=== Goals + +* Define a method to manage emergency access credentials for OpenShift 4 clusters +* The credentials should be relatively short-lived and it must be possible to rotate them + +=== Non-Goals + +* Replace regular authentication + +== Proposals + +=== Credential type + +==== Issue short-lived certificates with cluster-admin privileges + +The first approach is that we issue client certificates with cluster-admin privileges. +This can be done either through Kubernetes' `CertificateSigningRequest` (CSR) resources, or by manually issuing certificates against a self-signed CA certificate which is installed as a client CA certificate in the cluster. + +One point to consider is that Kubernetes doesn't support issuing client certificates for group `system:masters` through CSRs, but group `system:cluster-admins` is allowed, and functionally equivalent on OpenShift 4. + +Note that the restriction that we can't revoke existing certificates still applies for certificates issued through CSRs or through a self-managed CA certificate. +However, if we use a self-managed CA certificate, we can invalidate existing certificates by rotating the CA and issuing a new certificate against the new CA. + +==== Use service account tokens with cluster-admin privileges + +The second approach is that we setup a Kubernetes service account which is granted `cluster-admin` privileges through a `ClusterRoleBinding` and issue service account tokens for that service account. + +We have two options to generate tokens for service accounts: + +. The https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#manually-create-an-api-token-for-a-serviceaccount[TokenRequest] API allows us to generate service account tokens which expire after a defined amount of time. +However, tokens which are manually created through the TokenRequest API (for example with `kubectl create token`) can't be invalidated before they expire. + +. [Non-expiring API tokens] are created by defining a secret of type `kubernetes.io/service-account-token`. +As the name suggests, these tokens don't expire. + +The only way to permanently invalidate service account tokens (both non-expiring and time-bound) is to delete the service account. +Creating a new service account with the same name in the same namespace doesn't reactivate tokens associated with a previous service account. + +The proposed approach for using service account tokens is to use the TokenRequest API to create short-lived API tokens to generate expiring admin credentials by default. +Additionally, introduce a mechanism to force the tool to recreate the service account to invalidate any old tokens that might have leaked. +That mechanism might be as simple as having the tool reconcile the service account and recreate it if it gets deleted. + +=== Credential management + +==== Extend Steward to manage credentials and write them to Vault + +We can extend https://syn.tools/steward[Steward] to manage and renew the credentials and store them in Vault. + +This allows us to issue relatively short-lived credentials (on the order of days), which limits the attack surface presented by engineers accessing admin credentials in emergency situations. + +Optionally, we can also extend Steward to render a full kubeconfig file based on the managed credentials and store that file in Vault in addition to the raw credentials. +If we store a full kubeconfig file in Vault, we can document a single `vault` CLI command which fetches the emergency kubeconfig for a cluster. + +==== Create a new tool which manages the credentials and writes them to Vault + +Instead of extending Steward, we could also create a new tool which manages admin credentials and writes them to Vault. +This would provide some level of separation of concerns, since managing admin credentials isn't necessarily part of the Project Syn bootstrap process. +Additionally, having a separate tool allows us to have releases independent of the fairly complex Steward release process. + +==== Manage credentials by hand + +Another approach is to manage and renew the admin credentials by hand. + +== Decision + +== Rationale + +== References + +* https://access.redhat.com/solutions/4845381[Red Hat solution which gives some details on the admin kubeconfig] +* https://access.redhat.com/solutions/6054981[Red Hat solution describing how to replace the CA for the initial admin kubeconfig] diff --git a/docs/modules/ROOT/partials/nav.adoc b/docs/modules/ROOT/partials/nav.adoc index 71c5af4d..e6f8749a 100644 --- a/docs/modules/ROOT/partials/nav.adoc +++ b/docs/modules/ROOT/partials/nav.adoc @@ -225,3 +225,4 @@ ** xref:oc4:ROOT:explanations/decisions/shipping-metrics-to-centralized-instance.adoc[] ** xref:oc4:ROOT:explanations/decisions/scheduled-mr-merges.adoc[] ** xref:oc4:ROOT:explanations/decisions/subscription-tracking.adoc[] +** xref:oc4:ROOT:explanations/decisions/admin-kubeconfig.adoc[]