monitoring: tweak search request duration metric to record success / failure #507

ggilmore · 2023-01-03T18:39:44Z

This tweaks the search request duration metric to record whether or not the search itself was successful. This follows the advice from Google SRE's Monitoring Distributed Systems:

Latency

... It’s important to distinguish between the latency of successful requests and the latency of failed requests.

For example, an HTTP 500 error triggered due to loss of connection to a database or other critical backend might be served very quickly; however, as an HTTP 500 error indicates a failed request, factoring 500s into your overall latency might result in misleading calculations. On the other hand, a slow error is even worse than a fast error! Therefore, it’s important to track error latency, as opposed to just filtering out errors.

jtibshirani

Took a quick look as I'm trying to learn more about our observability features.

jtibshirani · 2023-01-03T20:12:12Z

shards/shards.go

-		Help:    "The duration a search request took in seconds",
-		Buckets: prometheus.DefBuckets, // DefBuckets good for service timings
-	})
+	metricSearchDuration = promauto.NewHistogramVec(


I'm curious how prometheus handles changes to metric types. Maybe because they're both histograms, it's not really a change and now there will just be labels?

@jtibshirani

In Prometheus' data model, time series are uniquely identified by their name (zoekt_search_duration_seconds) and any attached labels (success). In my interpretation, this means that adding a new label creates a new time series. When we write Grafana dasbhoards that query on zoekt_search_duration_seconds{success=true}, the old data points (from zoekt_search_duration_seconds observations that occurred before we added the success label ) won't be returned from the Prometheus server.

Does this answer your question?

Yep, thanks for the reference.

search: tweak search duration metric to record success / failure

2fb4149

ggilmore requested a review from a team January 3, 2023 18:39

ggilmore changed the title ~~search: tweak search request duration metric to record success / failure~~ monitoring: tweak search request duration metric to record success / failure Jan 3, 2023

jtibshirani approved these changes Jan 3, 2023

View reviewed changes

gl-srgr approved these changes Jan 6, 2023

View reviewed changes

keegancsmith approved these changes Jan 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring: tweak search request duration metric to record success / failure #507

monitoring: tweak search request duration metric to record success / failure #507

ggilmore commented Jan 3, 2023 •

edited

Loading

jtibshirani left a comment

jtibshirani Jan 3, 2023

ggilmore Jan 9, 2023

jtibshirani Jan 9, 2023

monitoring: tweak search request duration metric to record success / failure #507

Are you sure you want to change the base?

monitoring: tweak search request duration metric to record success / failure #507

Conversation

ggilmore commented Jan 3, 2023 • edited Loading

Latency

jtibshirani left a comment

Choose a reason for hiding this comment

jtibshirani Jan 3, 2023

Choose a reason for hiding this comment

ggilmore Jan 9, 2023

Choose a reason for hiding this comment

jtibshirani Jan 9, 2023

Choose a reason for hiding this comment

ggilmore commented Jan 3, 2023 •

edited

Loading