Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Create a Prometheus Alert for Missing Traces for a Specific Component in Tempo? #4322

Closed
rajushrajan opened this issue Nov 14, 2024 · 10 comments
Labels
stale Used for stale issues / PRs

Comments

@rajushrajan
Copy link

Hi everyone,

I’m working on a Prometheus alert to trigger when traces are missing for any component in Tempo. Currently, I have the following query, which triggers an alert when there are no traces available for a specific time window (e.g., 5 minutes):

sum by (cluster, namespace) (avg_over_time(tempo_ingester_live_traces[5m])) == 0

This works well for triggering an alert when no traces are ingested for the entire system (across any components) within the specified time window. However, I need to modify the query so that the alert is triggered when traces are missing for any component within a specific namespace or cluster.

How can I modify the query so that it triggers an alert when traces are missing for any component (not just globally or for a specific component ) within a cluster or namespace? I want the query to check for missing traces for each component, rather than globally.

I am using Tempo for trace ingestion and Prometheus for monitoring.
The metric I’m working with is tempo_ingester_live_traces, which is labeled by component, namespace, and cluster.

@javiermolinar
Copy link
Contributor

Hi,

I believe the span metrics from the Metrics Generator can help you achieve what you want:

https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/

These metrics include additional labels, based on the trace data, for instance, the name of the service that generated the span. You can even define custom labels.

@joe-elliott
Copy link
Member

I will also point out we've recently added "usage trackers" which will be in Tempo 2.7:

#4162

These will allow you to breakdown received bytes/second by any span or resource labels (namespace, cluster, etc) and publish those metrics directly from the distributor. (no metrics generator/prometheus required)

@rajushrajan
Copy link
Author

Hi @joe-elliott , Thank you for your response. I will explore the usage trackers and get back to you.

@rajushrajan
Copy link
Author

Hi @joe-elliott Any idea when tempo 2.7 version will get released?

@joe-elliott
Copy link
Member

Likely December or January.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Jan 27, 2025
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 12, 2025
@enoshfranclin
Copy link

Hi @joe-elliott

We can see that tempo 2.7 has been released. Could you please let us know how we can implement alerts for missing traces as mentioned in the comment #4322 (comment) using the new usage tracker feature?

@joe-elliott
Copy link
Member

There are some docs here. Give it a shot:

https://grafana.com/docs/tempo/latest/api_docs/#usage-metrics

@javiermolinar
Copy link
Contributor

javiermolinar commented Feb 12, 2025

Just to complement what @joe-elliott mentioned, this has to be activated in the distributor first:

https://grafana.com/docs/tempo/latest/configuration/#distributor

 usage:
        cost_attribution:
            # Enables the "cost-attribution" usage tracker. Per-tenant attributes are configured in overrides.
            [enabled: <boolean> | default = false]

The dimensions you want to break down your usage data can be configured in the standard overrides:

https://grafana.com/docs/tempo/latest/configuration/#standard-overrides

# Cost attribution usage tracker configuration
    cost_attribution:
      # List of attributes to group ingested data by.  Map value is optional. Can be used to rename and
      # combine attributes.
      dimensions: <map string to string>

ie:

  cost_attribution:
                dimensions:
                    foo: ""

Where the key of the map is an attribute of the span

@enoshfranclin
Copy link

Thanks for the reply. We will try this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

No branches or pull requests

4 participants