[Enhancement]: Worst-case performance of most join-based queries can be improved #1034

solidDoWant · 2025-03-06T06:44:54Z

What's the general idea for the enhancement?

Quite a few queries in this repo make use of a similar pattern:

sum(
    rate(some_metric{namespace!=""})
    * on(cluster,namespace,pod) group_left()
    kube_pod_info{host_network="false"} # Sometimes this is wrapped by `topk`
) by (namespace) # Or pod

This is essentially just:

Calculate some metric of interest on a (potentially) very large number of series
Perform a join between this metric and another metric with pod information
Sum the resulting metric

Let S represent the number of series returned by rate(some_metric{namespace!=""}), and P be the number of pods in the cluster (across all namespaces), and N be the number of namespaces. Note that in some cases S >> P, if there are additional labels (such as the interface label on network metrics when a pod has multiple metrics).

The series returned in (1) are all eventually grouped by just a few labels, and then aggregated over one. With the way this query is currently written, the query engine needs to store S series for the rate calculation metric, perform a join on P series for pod info (S * P worst case, P * P best case), and then sum the joined series into a resulting N series.

Here's where this can be optimized. If the rate query is summed over the labels used by the join and the outer labels, the resulting number of time series is cut down from S to P. The subsequent join now results in P * P in every case. This reduces the number of series that needs to be stored, and processed, reducing the memory requirements significantly (S / Px less memory required). While summing multiple times is less efficient, there are fewer values to sum. I measured a 20% decrease in latency when switching to this (below).

Please provide any helpful snippets.

Here's an example pulled from the cluster networking dashboard. There are a few dozen places that this could be implemented - just search the repo for group_left\s*\(\s*\).

sum(
  rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s])
    * on(cluster,namespace,pod,node) group_left()
  topk(
    1,
    max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod)
  ) by(cluster,namespace,pod)
) by(namespace)

This requires an estimated (by the query engine I'm using) 57,899,664 bytes to process the rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s]) query, and another 5,095,680 bytes for the max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod) query. The total execution time was about 166ms.

Here's what the query looks like with this improvement:

sum(
  sum(
    rate(container_network_receive_bytes_total{cluster="",namespace!=""}[1i])
  ) by(cluster,namespace,pod)
    * on(cluster,namespace,pod,node) group_left()
  topk(
    1,
    max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod)
  ) by(cluster,namespace,pod)
) by(namespace)

This requires an estimated (by the query engine I'm using) 10,616,000 bytes to process the rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s]) query, and another 1,900,264 bytes for the max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod) query. The total execution time was about 206ms.

What parts of the codebase does the enhancement target?

Dashboards, Alerts, Rules

Anything else relevant to the enhancement that would help with the triage process?

No response

I agree to the following terms:

I agree to follow this project's Code of Conduct.
I have filled out all the required information above to the best of my ability.
I have searched the issues of this repository and believe that this is not a duplicate.
I have confirmed this proposal applies to the default branch of the repository, as of the latest commit at the time of submission.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Worst-case performance of most join-based queries can be improved #1034

[Enhancement]: Worst-case performance of most join-based queries can be improved #1034

solidDoWant commented Mar 6, 2025 •

edited

Loading

[Enhancement]: Worst-case performance of most join-based queries can be improved #1034

[Enhancement]: Worst-case performance of most join-based queries can be improved #1034

Comments

solidDoWant commented Mar 6, 2025 • edited Loading

What's the general idea for the enhancement?

Please provide any helpful snippets.

What parts of the codebase does the enhancement target?

Anything else relevant to the enhancement that would help with the triage process?

I agree to the following terms:

solidDoWant commented Mar 6, 2025 •

edited

Loading