-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8sclusterreceiver] refactoring pod status phase #24425
Comments
Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I see your point, but I have a couple of counter-arguments:
If a user wants this particular representation, I believe it can be translated with the metrics transform processor. Also, most backends likely have a query language capable of representing this metric as 5 "boolean" time series. I wouldn't like to make this change until we have strict guidelines from the OTel Spec on how to represent this kind of data |
Thanks for reply. I agree with you, the |
Can you provide a screenshot/example?
Example? I can't get this to work in either of the monitoring vendors my company uses. Since the actual numeric values are meaningless, I can't see how to make this work in a when aggregating across dimensions. Many monitoring tools also don't let you filter on metric values, just tags (i.e. you can't say I think an OpenMetrics StateSet would be the more idiomatic way to model this information (and is also supported by OTel). This is also how kube-state-metrics does it with |
@dmitryax can you give a rough example of how you would use |
I agree with @sirianni. OOT, So you'll push several datapoints to represent the current "running" state of the pod:
Not ultra optimal (cardinality), but it does the job, and it's easily query-able. Also, it's a good incentive for OpenTelemetry to specify the StateSet metric type! 😉 |
Take a look at the hw.status metric in OTel Semantic Conventions which uses the value-as-attribute approach I'm suggesting. In particular, see this note
|
We discussed this issue in the SIG meeting today. We decided we want the Spec to make a decision on |
There is also an open question on where this metric should live. @sirianni can you provide more details on the scaling issue you've experienced? |
Sure. We run several large Kubernetes clusters (over 1000 nodes, over 12,000 pods) where using a single cluster-wide collector runs into scalability issues (scrape timeouts, etc.). Here is the Datadog Agent PR I referenced in today's SIG call where Airbnb mentions similar scalability issues using a cluster-wide collector (i.e. kube-state-metrics) for this data. Collecting this information from the kubelet via a DaemonSet scales incrementally as the cluster grows. |
Link to slack discussion mentioned in the meeting: https://cloud-native.slack.com/archives/C01NP3BV26R/p1623952435074700 |
Reading through this issue again is there actually anything we need from the spec? This link (https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#stateset) feels like it is pretty clear about how we handle statesets in otel metrics. |
That is for how to convert from openmetrics stateset metrics to OTLP. The idea discussed in the slack thread was to make the stateset pattern (or something close to it) a recommended pattern for OTel instrumentation. I would like to see, at minimum, the specification recommend using labels, rather than values to represent enums (i.e. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Ran into another example of some confusion/frustration recently |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
receiver/k8scluster
Describe the issue you're reporting
We currently encode pod status phase to number:
I think it's preferable to split into many metrics:
IMO it's easier to parse from the name what is happening and we already have this pattern for Nodes conditions.
Thoughts?
The text was updated successfully, but these errors were encountered: