Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate select pod labels to all metrics without requiring promql metric joins. #2551

Open
jacobstr opened this issue Nov 8, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@jacobstr
Copy link
Contributor

jacobstr commented Nov 8, 2024

What would you like to be added:

In large clusters, pod replicas often represent a "class" or "category" or worker. It's really useful to get metrics broken down by e.g. the pod's controller, or some common label that categorizes pods without the cardinality of tens of thousands of pod IDs.

The time series' kube_pod_labels covers some ground, but requires joins. Joins, for usability, often engender recording rules. Recording rules in turn, demand stateful prometheus servers which e.g. 4x the memory footprint in systems like prometheus running in agent mode, or grafana's alloy. Particularly, if one is forwarding the metrics to a 3rd party it's nice to minimize the footprint of in-house infra by keeping things stateless as long as possible - certainly at some point you hit a big beefy backend that you can query.

Why not skip the label join requirement entirely? And allow for configurable appending of specific kube pod labels to the underlying time series instead. Again, this works incredibly well with "stateless" prometheus remote-writers and system's like Grafana's adaptive metrics that can do some limited metric aggregation etc without a full recording rule engine.

Why is this needed:

  • Improved UX. You can immediately get things like kube_pod_container_status_restarts_total categorized by a workload without having to jump into recording rules.
  • Improved performance in 3rd party systems with adaptive metrics or similar solutions.

Describe the solution you'd like

  • A flag that basically says - take this pod label and propagate it to all kube_pod_* time series' without requiring a label join.

Additional context

  • Potentially, it would be desirable to do this on individual kube_pod_ time series, that does stretch the command line arguments approach though and it might just make sense to leverage the existing allow list and simply have a secondary boolean to enable / disable the label (and I suppose annotation for symmetry...) propagation.
  • Ideally this is a benign if you don't opt in, if you do I'd expect additional memory usage for each ts to propagate the additional bytes for the label. The whole theory here is that when you have 10k pods that belong to "temporal" or "flyte" the subsequent ease with which you can do aggregations using something like https://grafana.com/blog/2023/05/09/adaptive-metrics-grafana-cloud-announcement/ would make it worthwhile.
  • I am indeed largely advocating for this from the POV of adaptive metrics, and if that technology was more robust maybe the problem could be pushed downstream into having Grafana allow for custom rule expression that allow for label joins... but that's kind of the heart of the ux issue here: having to do label joins affects everything and is so fundamental that doing it closer-to-the source would obviate a lot of downstream work.
@jacobstr jacobstr added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 8, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 8, 2024
@jacobstr jacobstr changed the title Propagate select pod labels all metrics without label joins. Propagate select pod labels to all metrics without requiring label joins. Nov 8, 2024
@jacobstr jacobstr changed the title Propagate select pod labels to all metrics without requiring label joins. Propagate select pod labels to all metrics without requiring promql metric joins. Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants