Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Worst-case performance of most join-based queries can be improved #1034

Open
4 tasks done
solidDoWant opened this issue Mar 6, 2025 · 0 comments
Open
4 tasks done

Comments

@solidDoWant
Copy link

solidDoWant commented Mar 6, 2025

What's the general idea for the enhancement?

Quite a few queries in this repo make use of a similar pattern:

sum(
    rate(some_metric{namespace!=""})
    * on(cluster,namespace,pod) group_left()
    kube_pod_info{host_network="false"} # Sometimes this is wrapped by `topk`
) by (namespace) # Or pod

This is essentially just:

  1. Calculate some metric of interest on a (potentially) very large number of series
  2. Perform a join between this metric and another metric with pod information
  3. Sum the resulting metric

Let S represent the number of series returned by rate(some_metric{namespace!=""}), and P be the number of pods in the cluster (across all namespaces), and N be the number of namespaces. Note that in some cases S >> P, if there are additional labels (such as the interface label on network metrics when a pod has multiple metrics).

The series returned in (1) are all eventually grouped by just a few labels, and then aggregated over one. With the way this query is currently written, the query engine needs to store S series for the rate calculation metric, perform a join on P series for pod info (S * P worst case, P * P best case), and then sum the joined series into a resulting N series.

Here's where this can be optimized. If the rate query is summed over the labels used by the join and the outer labels, the resulting number of time series is cut down from S to P. The subsequent join now results in P * P in every case. This reduces the number of series that needs to be stored, and processed, reducing the memory requirements significantly (S / Px less memory required). While summing multiple times is less efficient, there are fewer values to sum. I measured a 20% decrease in latency when switching to this (below).

Please provide any helpful snippets.

Here's an example pulled from the cluster networking dashboard. There are a few dozen places that this could be implemented - just search the repo for group_left\s*\(\s*\).

sum(
  rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s])
    * on(cluster,namespace,pod,node) group_left()
  topk(
    1,
    max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod)
  ) by(cluster,namespace,pod)
) by(namespace)

This requires an estimated (by the query engine I'm using) 57,899,664 bytes to process the rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s]) query, and another 5,095,680 bytes for the max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod) query. The total execution time was about 166ms.

Here's what the query looks like with this improvement:

sum(
  sum(
    rate(container_network_receive_bytes_total{cluster="",namespace!=""}[1i])
  ) by(cluster,namespace,pod)
    * on(cluster,namespace,pod,node) group_left()
  topk(
    1,
    max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod)
  ) by(cluster,namespace,pod)
) by(namespace)

This requires an estimated (by the query engine I'm using) 10,616,000 bytes to process the rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s]) query, and another 1,900,264 bytes for the max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod) query. The total execution time was about 206ms.

What parts of the codebase does the enhancement target?

Dashboards, Alerts, Rules

Anything else relevant to the enhancement that would help with the triage process?

No response

I agree to the following terms:

  • I agree to follow this project's Code of Conduct.
  • I have filled out all the required information above to the best of my ability.
  • I have searched the issues of this repository and believe that this is not a duplicate.
  • I have confirmed this proposal applies to the default branch of the repository, as of the latest commit at the time of submission.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant