You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kfp-persistence has a health check that checks for accessibility on a metrics endpoint. However, neither the charm implements a MetricsEndpointProvider neither upstream code seems to implement any metrics. This was introduced during the sidecar rewrite with baseCharm, which means that it could be a misconception about how we use health checks. The check thus should be removed.
To Reproduce
Deploy kfp-persistence and relate it to required dependencies
Environment
Juju 3.5, Microk8s 1.28
Relevant Log Output
─$ kfl kfp-persistence-0 -c persistenceagent -f
2024-06-12T08:52:35.461Z [pebble] HTTP API server listening on ":38813".
2024-06-12T08:52:35.461Z [pebble] Started daemon.
2024-06-12T08:52:54.189Z [pebble] GET /v1/plan?format=yaml 78.41µs 200
2024-06-12T08:52:54.190Z [pebble] POST /v1/layers 166.969µs 200
2024-06-12T08:53:05.499Z [pebble] GET /v1/notices?timeout=30s 30.000493302s 200
2024-06-12T08:53:35.500Z [pebble] GET /v1/notices?timeout=30s 30.001060881s 200
2024-06-12T08:54:05.501Z [pebble] GET /v1/notices?timeout=30s 30.000893481s 200
2024-06-12T08:54:13.983Z [pebble] POST /v1/files 3.690543ms 200
2024-06-12T08:54:14.005Z [pebble] GET /v1/plan?format=yaml 162.142µs 200
2024-06-12T08:54:14.007Z [pebble] POST /v1/layers 296.708µs 200
2024-06-12T08:54:14.011Z [pebble] POST /v1/services 4.262304ms 202
2024-06-12T08:54:14.014Z [pebble] GET /v1/notices?timeout=30s 8.512968209s 200
2024-06-12T08:54:14.015Z [pebble] Service "persistenceagent" starting: persistence_agent --logtostderr=true --namespace= --ttlSecondsAfterWorkflowFinish=86400 --numWorker=2 --mlPipelineAPIServerName=kfp-api.kubeflow
2024-06-12T08:54:14.096Z [persistenceagent] W0612 08:54:14.096332 15 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-06-12T08:54:15.022Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A14.011973404Z&timeout=30s 1.007109898s 200
2024-06-12T08:54:15.022Z [pebble] GET /v1/changes/1/wait?timeout=4.000s 1.010184868s 200
2024-06-12T08:54:15.055Z [pebble] GET /v1/services 83.884µs 200
2024-06-12T08:54:17.391Z [pebble] GET /v1/services 49.967µs 200
2024-06-12T08:54:44.011Z [pebble] Check "persistenceagent-get" failure 1 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
2024-06-12T08:54:45.023Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A15.017313558Z&timeout=30s 30.00090974s 200
2024-06-12T08:55:14.008Z [pebble] Check "persistenceagent-get" failure 2 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
2024-06-12T08:55:15.024Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A15.017313558Z&timeout=30s 30.000130261s 200
2024-06-12T08:55:44.010Z [pebble] Check "persistenceagent-get" failure 3 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
2024-06-12T08:55:44.010Z [pebble] Check "persistenceagent-get" failure threshold 3 hit, triggering action
2024-06-12T08:55:45.025Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A15.017313558Z&timeout=30s 30.001000892s 200
2024-06-12T08:56:14.011Z [pebble] Check "persistenceagent-get" failure 4 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
2024-06-12T08:56:15.026Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A15.017313558Z&timeout=30s 30.000986384s 200
2024-06-12T08:56:16.458Z [persistenceagent] time="2024-06-12T08:56:16Z" level=fatal msg="Error creating ML pipeline API Server client: Failed to initialize pipeline client. Error: Waiting for ml pipeline API server failed after all attempts.: Get \"http://kfp-api.kubeflow:8888/apis/v1beta1/healthz\": dial tcp 10.152.183.187:8888: connect: connection refused: Waiting for ml pipeline API server failed after all attempts.: Get \"http://kfp-api.kubeflow:8888/apis/v1beta1/healthz\": dial tcp 10.152.183.187:8888: connect: connection refused"
2024-06-12T08:56:16.461Z [pebble] Service "persistenceagent" stopped unexpectedly with code 1
2024-06-12T08:56:16.461Z [pebble] Service "persistenceagent" on-failure action is "restart", waiting ~500ms before restart (backoff 1)
2024-06-12T08:56:17.002Z [pebble] Service "persistenceagent" starting: persistence_agent --logtostderr=true --namespace= --ttlSecondsAfterWorkflowFinish=86400 --numWorker=2 --mlPipelineAPIServerName=kfp-api.kubeflow
2024-06-12T08:56:17.033Z [persistenceagent] W0612 08:56:17.033566 29 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-06-12T08:56:44.011Z [pebble] Check "persistenceagent-get" failure 5 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
2024-06-12T08:56:45.028Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A15.017313558Z&timeout=30s 30.000947153s 200
2024-06-12T08:57:14.010Z [pebble] Check "persistenceagent-get" failure 6 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
2024-06-12T08:57:15.029Z [pebble] GET /v1/notices?after=2024-06-12T08%3A54%3A15.017313558Z&timeout=30s 30.001004338s 200
2024-06-12T08:57:44.011Z [pebble] Check "persistenceagent-get" failure 7 (threshold 3): Get "http://localhost:8080/metrics": dial tcp [::1]:8080: connect: connection refused
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Bug Description
kfp-persistence has a health check that checks for accessibility on a metrics endpoint. However, neither the charm implements a MetricsEndpointProvider neither upstream code seems to implement any metrics. This was introduced during the sidecar rewrite with baseCharm, which means that it could be a misconception about how we use health checks. The check thus should be removed.
To Reproduce
Deploy kfp-persistence and relate it to required dependencies
Environment
Juju 3.5, Microk8s 1.28
Relevant Log Output
Additional Context
No response
The text was updated successfully, but these errors were encountered: