You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quite a few queries in this repo make use of a similar pattern:
sum(
rate(some_metric{namespace!=""})
*on(cluster,namespace,pod) group_left()
kube_pod_info{host_network="false"} # Sometimes this is wrapped by `topk`
) by (namespace) # Or pod
This is essentially just:
Calculate some metric of interest on a (potentially) very large number of series
Perform a join between this metric and another metric with pod information
Sum the resulting metric
Let S represent the number of series returned by rate(some_metric{namespace!=""}), and P be the number of pods in the cluster (across all namespaces), and N be the number of namespaces. Note that in some cases S >> P, if there are additional labels (such as the interface label on network metrics when a pod has multiple metrics).
The series returned in (1) are all eventually grouped by just a few labels, and then aggregated over one. With the way this query is currently written, the query engine needs to store S series for the rate calculation metric, perform a join on P series for pod info (S * P worst case, P * P best case), and then sum the joined series into a resulting N series.
Here's where this can be optimized. If the rate query is summed over the labels used by the join and the outer labels, the resulting number of time series is cut down from S to P. The subsequent join now results in P * P in every case. This reduces the number of series that needs to be stored, and processed, reducing the memory requirements significantly (S / Px less memory required). While summing multiple times is less efficient, there are fewer values to sum. I measured a 20% decrease in latency when switching to this (below).
Please provide any helpful snippets.
Here's an example pulled from the cluster networking dashboard. There are a few dozen places that this could be implemented - just search the repo for group_left\s*\(\s*\).
This requires an estimated (by the query engine I'm using) 57,899,664 bytes to process the rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s]) query, and another 5,095,680 bytes for the max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod) query. The total execution time was about 166ms.
Here's what the query looks like with this improvement:
This requires an estimated (by the query engine I'm using) 10,616,000 bytes to process the rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s]) query, and another 1,900,264 bytes for the max(kube_pod_info{host_network="false"}) by(cluster,namespace,pod) query. The total execution time was about 206ms.
What parts of the codebase does the enhancement target?
Dashboards, Alerts, Rules
Anything else relevant to the enhancement that would help with the triage process?
What's the general idea for the enhancement?
Quite a few queries in this repo make use of a similar pattern:
This is essentially just:
Let
S
represent the number of series returned byrate(some_metric{namespace!=""})
, andP
be the number of pods in the cluster (across all namespaces), andN
be the number of namespaces. Note that in some casesS >> P
, if there are additional labels (such as the interface label on network metrics when a pod has multiple metrics).The series returned in (1) are all eventually grouped by just a few labels, and then aggregated over one. With the way this query is currently written, the query engine needs to store
S
series for the rate calculation metric, perform a join onP
series for pod info (S * P
worst case,P * P
best case), and then sum the joined series into a resultingN
series.Here's where this can be optimized. If the rate query is summed over the labels used by the join and the outer labels, the resulting number of time series is cut down from
S
toP
. The subsequent join now results inP * P
in every case. This reduces the number of series that needs to be stored, and processed, reducing the memory requirements significantly (S / P
x less memory required). While summing multiple times is less efficient, there are fewer values to sum. I measured a 20% decrease in latency when switching to this (below).Please provide any helpful snippets.
Here's an example pulled from the cluster networking dashboard. There are a few dozen places that this could be implemented - just search the repo for
group_left\s*\(\s*\)
.This requires an estimated (by the query engine I'm using) 57,899,664 bytes to process the
rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s])
query, and another 5,095,680 bytes for themax(kube_pod_info{host_network="false"}) by(cluster,namespace,pod)
query. The total execution time was about 166ms.Here's what the query looks like with this improvement:
This requires an estimated (by the query engine I'm using) 10,616,000 bytes to process the
rate( rate(container_network_receive_bytes_total{cluster="",namespace!=""}[3s])
query, and another 1,900,264 bytes for themax(kube_pod_info{host_network="false"}) by(cluster,namespace,pod)
query. The total execution time was about 206ms.What parts of the codebase does the enhancement target?
Dashboards, Alerts, Rules
Anything else relevant to the enhancement that would help with the triage process?
No response
I agree to the following terms:
The text was updated successfully, but these errors were encountered: