Skip to content

Commit

Permalink
[DOC] Update metrics query docs with examples, more details (#4248)
Browse files Browse the repository at this point in the history
* Update metrics query doc with more examples

* Update content from doc session

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>

* Restructure and update metrics queries

* Updates to meet review comments

* fix paragraph

* Updates for conflict

---------

Co-authored-by: Jennifer Villa <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
(cherry picked from commit 532f83f)
  • Loading branch information
knylander-grafana committed Oct 30, 2024
1 parent d773b14 commit ff8584c
Show file tree
Hide file tree
Showing 7 changed files with 428 additions and 170 deletions.
23 changes: 14 additions & 9 deletions docs/sources/tempo/api_docs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ For externally supported GRPC API, [see below](#tempo-grpc-api).
| [Search tag names V2](#search-tags-v2) | Query-frontend | HTTP | `GET /api/v2/search/tags` |
| [Search tag values](#search-tag-values) | Query-frontend | HTTP | `GET /api/search/tag/<tag>/values` |
| [Search tag values V2](#search-tag-values-v2) | Query-frontend | HTTP | `GET /api/v2/search/tag/<tag>/values` |
| [TraceQL Metrics](#traceql-metrics) | Query-frontend | HTTP | `GET /api/metrics/query_range` |
| [TraceQL Metrics](#traceql-metrics) | Query-frontend | HTTP | `GET /api/metrics/query_range` |
| [TraceQL Metrics (instant)](#instant) | Query-frontend | HTTP | `GET /api/metrics/query` |
| [Query Echo Endpoint](#query-echo-endpoint) | Query-frontend | HTTP | `GET /api/echo` |
| [Overrides API](#overrides-api) | Query-frontend | HTTP | `GET,POST,PATCH,DELETE /api/overrides` |
Expand Down Expand Up @@ -311,8 +311,9 @@ $ curl -G -s http://localhost:3200/api/search --data-urlencode 'tags=service.nam
Ingester configuration `complete_block_timeout` affects how long tags are available for search.
This endpoint retrieves all discovered tag names that can be used in search. The endpoint is available in the query frontend service in
a microservices deployment, or the Tempo endpoint in a monolithic mode deployment. The tags endpoint takes a scope that controls the kinds
This endpoint retrieves all discovered tag names that can be used in search.
The endpoint is available in the query frontend service in a microservices deployment, or the Tempo endpoint in a monolithic mode deployment.
The tags endpoint takes a scope that controls the kinds
of tags or attributes returned. If nothing is provided, the endpoint returns all resource and span tags.
```
Expand Down Expand Up @@ -518,7 +519,9 @@ If a particular service name (for example, `shopping-cart`) is only present on s
### TraceQL Metrics
The TraceQL Metrics API returns Prometheus-like time-series for a given metrics query. Metrics queries are those using metrics functions like `rate()` and `quantile_over_time()`. See the [documentation]({{< relref "../traceql/metrics-queries" >}}) for the complete list.
The TraceQL Metrics API returns Prometheus-like time-series for a given metrics query.
Metrics queries are those using metrics functions like `rate()` and `quantile_over_time()`.
Refer to the [TraceQL metrics documentation](https://grafana.com/docs/tempo/<TEMPO_VERSION>/traceql/metrics-queries/) for more information list.
Parameters:
Expand All @@ -529,18 +532,20 @@ Parameters:
- `end = (unix epoch seconds | unix epoch nanoseconds | RFC3339 string)`
Optional. Along with `start` define the time range. Providing both `start` and `end` includes blocks for the specified time range only.
- `since = (duration string)`
Optional. Can be used instead of `start` and `end` to define the time range in relative values. For example `since=15m` will query the last 15 minutes. Default is last 1 hour.
Optional. Can be used instead of `start` and `end` to define the time range in relative values. For example, `since=15m` queries the last 15 minutes. Default is the last 1 hour.
- `step = (duration string)`
Optional. Defines the granularity of the returned time-series. For example `step=15s` will return a data point every 15s within the time range. If not specified then the default behavior will choose a dynamic step based on the time range.
Optional. Defines the granularity of the returned time-series. For example, `step=15s` returns a data point every 15s within the time range. If not specified, then the default behavior chooses a dynamic step based on the time range.
- `exemplars = (integer)`
Optional. Defines the maximun number of exemplars for the query. It will be trimmed to max_exemplars if exceed it.
The API is available in the query frontend service in
a microservices deployment, or the Tempo endpoint in a monolithic mode deployment.
For example the following request computes the rate of spans received for `myservice` over the last three hours, at 1 minute intervals.
For example, the following request computes the rate of spans received for `myservice` over the last three hours, at 1 minute intervals.
{{< admonition type="note" >}}
Actual API parameters must be url-encoded. This example is left unencoded for readability.
{{% /admonition %}}
{{< /admonition >}}
```
GET /api/metrics/query_range?q={resource.service.name="myservice"}|rate()&since=3h&step=1m
Expand Down Expand Up @@ -763,6 +768,6 @@ service StreamingQuerier {
rpc SearchTagsV2(SearchTagsRequest) returns (stream SearchTagsV2Response) {}
rpc SearchTagValues(SearchTagValuesRequest) returns (stream SearchTagValuesResponse) {}
rpc SearchTagValuesV2(SearchTagValuesRequest) returns (stream SearchTagValuesV2Response) {}
rpc MetricsQueryRange(QueryRangeRequest) returns (stream QueryRangeResponse) {}
rpc MetricsQueryRange(QueryRangeRequest) returns (stream QueryRangeResponse) {}
}
```
2 changes: 1 addition & 1 deletion docs/sources/tempo/metrics-generator/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ weight: 500
# Metrics-generator

Metrics-generator is an optional Tempo component that derives metrics from ingested traces.
If present, the distributors write received spans to both the ingester and the metrics-generator.
If present, the distributor writes received spans to both the ingester and the metrics-generator.
The metrics-generator processes spans and writes metrics to a Prometheus data source using the Prometheus remote write protocol.

## Architecture
Expand Down
40 changes: 40 additions & 0 deletions docs/sources/tempo/operations/traceql-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,46 @@ Setting `flush_to_storage` to `true` ensures that metrics blocks are flushed to

For more information about overrides, refer to [Standard overrides](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#standard-overrides).

```yaml
overrides:
'tenantID':
metrics_generator_processors:
- local-blocks
```

By default, for all tenants in the main configuration:

```yaml
overrides:
defaults:
metrics_generator:
processors: [local-blocks]
```

Add this configuration to run TraceQL metrics queries against all spans (and not just server spans):

```yaml
metrics_generator:
processor:
local_blocks:
filter_server_spans: false
```

If you configured Tempo using the `tempo-distributed` Helm chart, you can also set `traces_storage` using your `values.yaml` file.
Refer to the [Helm chart for an example](https://github.com/grafana/helm-charts/blob/559ecf4a9c9eefac4521454e7a8066778e4eeff7/charts/tempo-distributed/values.yaml#L362).

```yaml
metrics_generator:
processor:
local_blocks:
flush_to_storage: true
```

Setting `flush_to_storage` to `true` ensures that metrics blocks are flushed to storage so TraceQL metrics queries against historical data.

For more information about overrides, refer to [Standard overrides](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#standard-overrides).


## Evaluate query timeouts

Because of their expensive nature, these queries can take a long time to run.
Expand Down
160 changes: 0 additions & 160 deletions docs/sources/tempo/traceql/metrics-queries.md

This file was deleted.

87 changes: 87 additions & 0 deletions docs/sources/tempo/traceql/metrics-queries/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: TraceQL metrics queries
menuTitle: TraceQL metrics queries
description: Learn about TraceQL metrics queries
weight: 600
keywords:
- metrics query
- TraceQL metrics
---

# TraceQL metrics queries

{{< docs/experimental product="TraceQL metrics" >}}

TraceQL metrics is an experimental feature in Grafana Tempo that creates metrics from traces.

Metric queries extend trace queries by applying a function to trace query results.
This powerful feature allows for ad hoc aggregation of any existing TraceQL query by any dimension available in your traces, much in the same way that LogQL metric queries create metrics from logs.

Traces are a unique observability signal that contain causal relationships between the components in your system.

TraceQL metrics can help answer questions like this:

* How many database calls across all systems are downstream of your application?
* What services beneath a given endpoint are currently failing?
* What services beneath an endpoint are currently slow?

TraceQL metrics can help you answer these questions by parsing your traces in aggregate.

TraceQL metrics are powered by the [TraceQL metrics API](https://grafana.com/docs/tempo/<TEMPO_VERSION>/api_docs/#traceql-metrics).

![Metrics visualization in Grafana](/media/docs/tempo/metrics-explore-sample-2.4.png)

## RED metrics, TraceQL, and PromQL

RED is an acronym for three types of metrics:

- Rate, the number of requests per second
- Errors, the number of those requests that are failing
- Duration, the amount of time those requests take

For more information about the RED method, refer to [The RED Method: how to instrument your services](/blog/2018/08/02/the-red-method-how-to-instrument-your-services/).

You can write TraceQL metrics queries to compute rate, errors, and durations over different groups of spans.

For more information on how to use TraceQL metrics to investigate issues, refer to [Solve problems with metrics queries](./solve-problems-metrics-queries).

## Enable and use TraceQL metrics

To use TraceQL metrics, you need to enable them on your Tempo database.
Refer to [Configure TraceQL metrics](https://grafana.com/docs/tempo/<TEMPO_VERSION>/operations/traceql-metrics/) for more information.

From there, you can either query the TraceQL metrics API directly (for example, with `curl`) or using Grafana
(recommended).
To run TraceQL metrics queries in Grafana, you need Grafana Cloud or Grafana 10.4 or later.
No extra configuration is needed.
Use a Tempo data source that points to a Tempo database with TraceQL metrics enabled.

Refer to [Solve problems using metrics queries](./solve-problems-metrics-queries/) for some real-world examples.

### Functions

TraceQL metrics queries currently include the following functions for aggregating over groups of spans: `rate`, `count_over_time`, `quantile_over_time`, `histogram_over_time`, and `compare`.
These functions can be added as an operator at the end of any TraceQL query.

For detailed information and example queries for each function, refer to [TraceQL metrics functions](./functions).

### Exemplars

Exemplars are a powerful feature of TraceQL metrics.
They allow you to see an exact trace that contributed to a given metric value.
This is particularly useful when you want to understand why a given metric is high or low.

Exemplars are available in TraceQL metrics for all range queries.
To get exemplars, you need to configure it in the query-frontend with the parameter `query_frontend.metrics.max_exemplars`,
or pass a query hint in your query.

Example:

```
{ span:name = "GET /:endpoint" } | quantile_over_time(duration, .99) by (span.http.target) with (exemplars=true)
```

{{< admonition type="note" >}}
TraceQL metric queries with exemplars aren't fully supported in Grafana Explore.
They will be supported in a future Grafana release.
{{< /admonition >}}
Loading

0 comments on commit ff8584c

Please sign in to comment.