Adding readinessProbe and livenessProbe probes to porch #92

mansoor17syed · 2025-01-13T11:36:17Z

No description provided.

Catalin-Stratulat-Ericsson

looks good.

mansoor17syed · 2025-01-15T08:29:37Z

Hi Team,

@liamfallon @efiacor can you review these changes
and share your feedback

liamfallon

/approve

mansoor17syed · 2025-01-17T15:52:33Z

Hi Team,
Could you please share your feedback or advise on the next steps to move forward?

kushnaidu · 2025-01-20T06:18:40Z

/approve

nephio-prow · 2025-01-20T06:18:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Catalin-Stratulat-Ericsson, kushnaidu, liamfallon, mansoor17syed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [liamfallon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liamfallon · 2025-01-20T06:19:12Z

I wanted @efiacor to take a quick look.

efiacor · 2025-01-20T15:35:32Z

nephio/core/porch/9-controllers.yaml

Hi @mansoor17syed , thank you for adding some best practice elements to the deployments.
The changes for the controllers should be fine as they are served by these - https://github.com/nephio-project/porch/blob/d7839690ca85fbb64669ae4158d5679c9ee29713/controllers/main.go#L174
but I am not 100% sure on the probes for the porch api server.

efiacor · 2025-01-20T15:53:42Z

nephio/core/porch/3-porch-server.yaml

@@ -80,6 +80,28 @@ spec:
            - --repo-sync-frequency=3m
            - --disable-validating-admissions-policy=true
            - --max-request-body-size=6291456 # Keep this in sync with function-runner's corresponding argument
+
+          #adding livenessProbes and readinessProbes for porch server
+          livenessProbe:


These are useful to meet best practices but may need some more investigation.
As the porch api server is a k8s aggregated api server, I am not 100% sure if we need to implement our own probe endpoints.
Also, the docs mention that /healthz is now deprecated and to use /livez instead.
https://kubernetes.io/docs/reference/using-api/health-checks/#individual-health-checks

Finally, unfortunately, the changes here are not "tested" on each PR commit so and breaking change only gets picked up when it gets deployed to the sandbox by the test-infra project. This is something we plan to address.
The configs here would need to be tailored to that env, and potentially monitored and updated if necessary.

We may need to add something to the server config to handle the checks, such as - https://github.com/kubernetes/apiserver/blob/master/pkg/server/healthz.go#L91

Somewhere here maybe - https://github.com/nephio-project/porch/blob/main/pkg/apiserver/apiserver.go#L275

Hi @efiacor
Thanks for the review and the detailed insights!

Since Porch is a Kubernetes aggregated API server, I’m not entirely sure whether we need to implement our own probe endpoints. Do we have any precedent for this in similar projects?
I see that /healthz is deprecated in favor of /livez—should we update it in this PR, or would that require additional considerations?
Regarding testing, since changes here aren’t validated on each PR commit and are only caught when deployed to the sandbox environment, how do you suggest we proceed? Should we align these configs with the test-infra setup, and if so, how is that managed?

Would appreciate your thoughts on the best approach here!

I think to get us started, we need to verify if we do need to implement some form of endpoints on the server to serve the requests. @kispaljr any idea on this one?
For testing the new probes -> endpoints out, we can use the "local" deployment in porch on a kind cluster to verify the probes are actually working.

nephio-project/porch#166

./scripts/setup-dev-env.sh

make run-in-kind

Check the probes with a "describe" or check the server logs for incoming http requests.

The liveness probe in particular could be useful to monitor the health of the api as we are seeing some behaviour where the service becomes unavailable. The frequency of the probe would need to be tailored thought o avoid spamming the api.

Hi @efiacor, sorry for the late reply. In my local environment

kubectl port-forward -n porch-system service/api 8443:api & curl -v -k https://127.0.0.1:8443/healthz

shows that the checked URLs are alive and exist in the latest porch server, they return with 200 OK, when the server is actually running. So from my perspective these probes seem to be correct.
Have you encountered any problems while trying them out?

curl -v -k https://127.0.0.1:8443/livez also works, we can change the liveness probe to it

Querying these paths doesn't seem to generate any logs in the apiserver logs either, so it seems to be working

I can see the readiness probe failing once in my env:

Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 8m48s default-scheduler Successfully assigned porch-system/porch-server-85bd69c887-4hnqm to porch-test-control-plane Normal Pulled 8m48s kubelet Container image "porch-kind/porch-server:test" already present on machine Normal Created 8m48s kubelet Created container porch-server Normal Started 8m48s kubelet Started container porch-server Warning Unhealthy 8m42s kubelet Readiness probe failed: HTTP probe failed with statuscode: 500

but not after that, so this also seems fine

Ok cool. If you are happy that it actually queries the server then go for it. I wasn't sure. I though maybe the k8s api was handling the default responses

Adding readinessProbe and livenessProbe probes to porch

3195201

nephio-prow bot requested review from efiacor and s3wong January 13, 2025 11:36

mansoor17syed requested a review from Catalin-Stratulat-Ericsson January 13, 2025 11:36

Catalin-Stratulat-Ericsson approved these changes Jan 13, 2025

View reviewed changes

mansoor17syed requested a review from liamfallon January 15, 2025 08:31

mansoor17syed mentioned this pull request Jan 15, 2025

adding livenessProbe and readinessProbe in deployment.yaml for porch … nephio-project/porch#166

Open

liamfallon reviewed Jan 15, 2025

View reviewed changes

nephio-prow bot added the approved label Jan 15, 2025

efiacor reviewed Jan 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding readinessProbe and livenessProbe probes to porch #92

Adding readinessProbe and livenessProbe probes to porch #92

mansoor17syed commented Jan 13, 2025

Catalin-Stratulat-Ericsson left a comment

mansoor17syed commented Jan 15, 2025

liamfallon left a comment

mansoor17syed commented Jan 17, 2025

kushnaidu commented Jan 20, 2025

nephio-prow bot commented Jan 20, 2025

liamfallon commented Jan 20, 2025

efiacor Jan 20, 2025

efiacor Jan 20, 2025

efiacor Jan 20, 2025

mansoor17syed Feb 6, 2025

efiacor Feb 6, 2025

kispaljr Feb 18, 2025 •

edited

Loading

kispaljr Feb 18, 2025

kispaljr Feb 18, 2025

kispaljr Feb 18, 2025

efiacor Feb 18, 2025

Adding readinessProbe and livenessProbe probes to porch #92

Are you sure you want to change the base?

Adding readinessProbe and livenessProbe probes to porch #92

Conversation

mansoor17syed commented Jan 13, 2025

Catalin-Stratulat-Ericsson left a comment

Choose a reason for hiding this comment

mansoor17syed commented Jan 15, 2025

liamfallon left a comment

Choose a reason for hiding this comment

mansoor17syed commented Jan 17, 2025

kushnaidu commented Jan 20, 2025

nephio-prow bot commented Jan 20, 2025

liamfallon commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kispaljr Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kispaljr Feb 18, 2025 •

edited

Loading