Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization #460

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Nov 18, 2024

This commit removes the use of the kube-rbac-proxy image and replaces it with metrics authentication/authorization provided by controller-runtime. The kube-rbac-proxy image is deprecated and will no longer be maintained, which introduces risks to production environments. For more details, see: kubernetes-sigs/kubebuilder#3907

Motivation: operator-framework/operator-controller#1509

Local Tests

To check the metrics endpoint

To grant the required permissions for metrics access, run:

kubectl create clusterrolebinding catalogd-metrics-binding \
   --clusterrole=catalogd-metrics-reader \
   --serviceaccount=olmv1-system:catalogd-controller-manager

Generate the token for the catalogd-controller-manager service account:

TOKEN=$(kubectl create token catalogd-controller-manager -n olmv1-system)
echo $TOKEN

Run a pod with a debug container to test the metrics endpoint:

kubectl run curl-metrics --rm -it --restart=Never \
  --image=curlimages/curl:7.87.0 -n olmv1-system -- /bin/sh

Checking the metrics

curl -v -k -H "Authorization: Bearer $TOKEN" \
https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics

Result

/ $ curl -v -k -H "Authorization: Bearer $TOKEN" \
> https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
*   Trying 10.96.52.46:7443...
* Connected to catalogd-service.olmv1-system.svc.cluster.local (10.96.52.46) port 7443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Dec 12 22:03:23 2024 GMT
*  expire date: Mar 12 22:03:23 2025 GMT
*  issuer: CN=olmv1-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: GET]
* h2h3 [:path: /metrics]
* h2h3 [:scheme: https]
* h2h3 [:authority: catalogd-service.olmv1-system.svc.cluster.local:7443]
* h2h3 [user-agent: curl/7.87.0-DEV]
* h2h3 [accept: */*]
* h2h3 [authorization: Bearer TOKEN
> authorization: Bearer TOKEN
> 
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 200 
< content-type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< date: Thu, 12 Dec 2024 22:11:24 GMT
< 
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
# HELP certwatcher_read_certificate_total Total number of certificate reads
...
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-05"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.001"} 1
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.01"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.1"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="10"} 3
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="100"} 3
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1000"} 3
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="+Inf"} 3
workqueue_work_duration_seconds_sum{controller="clustercatalog",name="clustercatalog"} 9.588737754
workqueue_work_duration_seconds_count{controller="clustercatalog",name="clustercatalog"} 3
* Connection #0 to host catalogd-service.olmv1-system.svc.cluster.local left intact

To validate the usage of certs within

Create the Pod with the secret

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: curl-metrics
  namespace: olmv1-system
spec:
  serviceAccountName: catalogd-controller-manager
  containers:
  - name: curl
    image: curlimages/curl:7.87.0
    command:
    - sh
    - -c
    - sleep 3600
    volumeMounts:
    - mountPath: /tmp/cert
      name: catalogd-cert
      readOnly: true
  volumes:
  - name: catalogd-cert
    secret:
      secretName: catalogd-service-cert-v1.0.0-rc1-22-gedbc799-dirty
  restartPolicy: Never
EOF

Jump in the curl

kubectl exec -it curl-metrics -n olmv1-system -- sh

Run the curl calling the metrics

curl -v --cacert /tmp/cert/ca.crt \
     --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key \
     -H "Authorization: Bearer $TOKEN" \
     https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics

Result

$ kubectl exec -it curl-metrics -n olmv1-system -- sh
/ $ export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
/ $ curl -v --cacert /tmp/cert/ca.crt \
>      --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key \
>      -H "Authorization: Bearer $TOKEN" \
>      https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
*   Trying 10.96.52.46:7443...
* Connected to catalogd-service.olmv1-system.svc.cluster.local (10.96.52.46) port 7443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /tmp/cert/ca.crt
*  CApath: none
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Dec 12 22:03:23 2024 GMT
*  expire date: Mar 12 22:03:23 2025 GMT
*  subjectAltName: host "catalogd-service.olmv1-system.svc.cluster.local" matched cert's "catalogd-service.olmv1-system.svc.cluster.local"
*  issuer: CN=olmv1-ca
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: GET]
* h2h3 [:path: /metrics]
* h2h3 [:scheme: https]
* h2h3 [:authority: catalogd-service.olmv1-system.svc.cluster.local:7443]
* h2h3 [user-agent: curl/7.87.0-DEV]
* h2h3 [accept: */*]

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 0% with 27 lines in your changes missing coverage. Please review.

Project coverage is 37.56%. Comparing base (f91558f) to head (4f171d3).

Files with missing lines Patch % Lines
cmd/manager/main.go 0.00% 27 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #460      +/-   ##
==========================================
- Coverage   38.23%   37.56%   -0.68%     
==========================================
  Files          15       15              
  Lines        1224     1246      +22     
==========================================
  Hits          468      468              
- Misses        706      728      +22     
  Partials       50       50              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 3 times, most recently from befdef7 to 8ca7c1d Compare November 18, 2024 12:38
@camilamacedo86 camilamacedo86 changed the title WIP replace kube-rbac-proxy ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@camilamacedo86 camilamacedo86 marked this pull request as ready for review November 18, 2024 12:40
@camilamacedo86 camilamacedo86 requested a review from a team as a code owner November 18, 2024 12:40
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 0ee7199 to 3183153 Compare November 18, 2024 13:43
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization WIP: ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 3183153 to 4ad7f35 Compare November 18, 2024 13:58
@camilamacedo86 camilamacedo86 changed the title WIP: ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization (HOLD - WIP) ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 4ad7f35 to 42147b6 Compare November 26, 2024 00:11
@camilamacedo86
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 26, 2024
@camilamacedo86 camilamacedo86 changed the title (HOLD - WIP) ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 26, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 42147b6 to 0d9cd88 Compare November 26, 2024 00:15
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization WIP - ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 26, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 2 times, most recently from 365aa93 to 5c04183 Compare December 2, 2024 12:15
@tmshort

This comment was marked as resolved.

@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 4 times, most recently from 5b32541 to c5abb8b Compare December 7, 2024 04:23
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from c5abb8b to e461013 Compare December 12, 2024 22:13
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2024
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization ✨ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Dec 12, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 2 times, most recently from abe9cea to e37c70e Compare December 12, 2024 22:54
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 3 times, most recently from 4d3281f to e461013 Compare December 13, 2024 00:38
cmd/manager/main.go Outdated Show resolved Hide resolved
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 2 times, most recently from 3ce4ce6 to 03c51cf Compare December 13, 2024 20:11
@camilamacedo86
Copy link
Contributor Author

Rebased with the tests to ensure backwords compatibility: #483

@camilamacedo86
Copy link
Contributor Author

Hi @joelanford

Can we uphold and move forward with this one?

@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 3 times, most recently from 9d7c119 to 5f6b423 Compare December 13, 2024 22:18
…n/authorization

This commit removes the use of the kube-rbac-proxy image and replaces it with metrics authentication/authorization provided by controller-runtime. The kube-rbac-proxy image is deprecated and will no longer be maintained, which introduces risks to production environments. For more details, see: kubernetes-sigs/kubebuilder#3907
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants