-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics: workflow_status_phase
(which includes workflow name label) and workflow_start_time
#13683
Comments
workflow_status_phase
(which includes workflow name label) and workflow_start_time
@tooptoop4 is someone working on it or I can investigate it for implementation? |
@napestershine u can work on it |
There is a big problem with adding workflow name to metrics is that it is very high cardinality - it essentially creates a separate data series for every workflow. All of these data series live in memory of the workflow controller for the lifetime of the workflow controller, and the receiving store will also need to store a separate time series for each one. I have already implemented some higher cardinality metrics (around namespaces and workflowTemplateRef names) to help with some of the issues you might be attempting to address, but blindly doing this will not be OK. The issue description doesn't explain why these metrics are needed per workflow. I am working on tracing for workflow support which may allow some of the metrics you want to be extracted from the traces. |
I might be new to this topic. So a simple use case is Lets say I have a cronworkflow and I want to check if it was triggered or not on its schedule. |
This proposal would give you the workflow name from which you'd have to establish the cronworkflow name. https://argo-workflows.readthedocs.io/en/latest/metrics/#cronworkflows_triggered_total gives you this with much less cardinality. |
surely they could be purged from memory if they have been succeeded/error/fail for more than 10mins |
@Joibel This feature is available in 3.6.x. Which has not been released yet officially. When can we expect that release? |
Official answer is, as always "when it's done". Currently there is an rc3 release out, we need to make an rc4 and then wait 2 weeks. I'd predict the first half of November now, but there aren't any promises. Please test rc3 and us know how that works for you. |
You have to hack the opentelemetry code to do this as this isn't considered the correct way to implement metrics. We do this for custom metrics already. This only solves one half of the problem though, you're still paying heavily for your metrics storage when cardinality is high. |
kube_pod_status_phase is already there with even higher cardinality |
in prometheus for pods there are metrics like kube_pod_status_phase and kube_pod_start_time
need similar metrics at workflow level
The text was updated successfully, but these errors were encountered: