Skip to content

Commit

Permalink
[DOC] Add documentation for prometheus metrics compatibility (#3320)
Browse files Browse the repository at this point in the history
* [DOC] Add documentation for prometheus metrics compatibility

* Move code fence

* Be specific about being non-specific

* Fix wording

* Add missing image

* Update docs/sources/tempo/metrics-generator/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Use suggested style for images

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics/_index.md

Co-authored-by: Kim Nylander <[email protected]>

* Mention the limited nature of the compatibilty

* Update wording slightly

* Add link for traceql

* Relocate docs file

* Drop doc alias

---------

Co-authored-by: Kim Nylander <[email protected]>
  • Loading branch information
zalegrala and knylander-grafana authored Feb 8, 2024
1 parent 5b739f8 commit e9ddb63
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 1 deletion.
9 changes: 8 additions & 1 deletion docs/sources/tempo/metrics-generator/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Every processor derives different metrics. Currently, the following processors a

- Service graphs
- Span metrics
- Local blocks

<p align="center"><img src="server-side-metrics-arch-overview.png" alt="Service metrics architecture"></p>

Expand All @@ -50,11 +51,17 @@ The more dimensions are enabled, the higher the cardinality of the generated met

To learn more about this processor, read the [documentation]({{< relref "./span_metrics" >}}).

## Local blocks

The local blocks processor stores spans for a set period of time and
enables more complex APIs to perform calculations on the data. The processor must be
enabled for certain metrics APIs to function.

## Remote writing metrics

The metrics-generator runs a Prometheus Agent that periodically sends metrics to a `remote_write` endpoint.
The `remote_write` endpoint is configurable and can be any [Prometheus-compatible endpoint](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
To learn more about the endpoint configuration, refer to the [Metrics-generator]({{< relref "../configuration#metrics-generator" >}}) section of the Tempo Configuration documentation.
Writing interval can be controlled via `metrics_generator.registry.collection_interval`.

When multi-tenancy is enabled, the metrics-generator forwards the `X-Scope-OrgID` header of the original request to the `remote_write` endpoint. This feature can be disabled by setting `remote_write_add_org_id_header` to false.
When multi-tenancy is enabled, the metrics-generator forwards the `X-Scope-OrgID` header of the original request to the `remote_write` endpoint. This feature can be disabled by setting `remote_write_add_org_id_header` to false.
104 changes: 104 additions & 0 deletions docs/sources/tempo/operations/prometheus_datasource.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
aliases: []
title: Prometheus compatibility
menuTitle: Prometheus compatibility
description: Learn about using tempo as a Prometheus data source.
weight: 550
keywords:
- Prometheus
- TraceQL
---

# Prometheus compatibility

{{< docs/experimental product="Prometheus-compatible API" >}}

This experimental Prometheus-compatible API provides a way to visualize metrics out of traces using TraceQL. Additionally, the only implemented endpoint is `/query_range`, to allow a Prometheus data source in Grafana to be used.

The API will have native support in a future Grafana release, possibly using an internal API, deprecating the need for
this compatibility layer.

## Configure a Grafana datasource

Once the `local-blocks` processor is enabled in your `metrics-generator`
configuration, you can add a new Prometheus data source in Grafana to use the
`query-frontend` at the `/prom` endpoint for this. Set the data source URL
parameter to `http://<HOST>:<PORT>/prom`. For example:

```
http://query-frontend.trace.svc.cluster.local:3200/prom
```

![Caption](/media/docs/tempo/metrics-data-source.png)

The data source test will fail with a `404` status code, but this is accepted
for the experimental feature.

### Query timeouts

Because of the expensive nature, these queries can take a long time to run in different systems.
As such, consider increasing the timeouts in various places of
the system to allow enough time for the data to be returned.

Consider these areas when raising timeouts:

- Any proxy in front of Grafana
- Grafana data source for Prometheus pointing at Tempo
- Tempo configuration
- `querier.search.query_timeout`
- `server.http_server_read_timeout`
- `server.http_server_write_timeout`

Additionally, a new `query_frontend.metrics` config has been added. The config
here will depend on the environment.

For example, in a cloud environment, smaller jobs with more concurrency may be
desired due to the nature of scale on the backend.

```yaml
query_frontend:
metrics:
concurrent_jobs: 1000
target_bytes_per_job: 2.25e+08 # ~225MB
interval: 30m0s
```
For an on-prem backend, you can improve query times by lowering the concurrency,
while increasing the job size.
```yaml
query_frontend:
metrics:
concurrent_jobs: 8
target_bytes_per_job: 1.25e+09 # ~1.25GB
```
## Visualize traces with TraceQL
Next, use Grafana `explore` to query the data source.

{{% admonition type="note" %}}
Even though you are using a Prometheus data source, queries don't use PromQL. Grafana will have no way to parse, validate or provide completion options for the query.
{{% /admonition %}}

Adjust the query `type` so only `range` is selected.

Here is an example query:

```
{status=error} | count_over_time() by (resource.service.name)
```

<!-- TODO: Point to supporting traceql docs. -->

This query has several parts to consider.

First, the query begins with the TraceQL `{status=error}`, to find spans who's `status` has been set to `error`. See the TraceQL docs on [selecting spans](https://grafana.com/docs/tempo/<TEMPO_VERSION>/traceql/#selecting-spans) for more information.

Next, the `| count_over_time()` passes the spans matched to a new function,
introduced with this API. There is also a `rate()` function.

Finally, the `by (resource.service.name)` groups the results into each series.

<p align="center"><img src="explore.png" alt="Metrics visualization in Grafana "></p>
![Caption](/media/docs/tempo/metrics-explore-sample.png)

0 comments on commit e9ddb63

Please sign in to comment.