Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent telemetry filter_default configuration option not respected by prometheus interface #21831

Open
wolfmd opened this issue Oct 15, 2024 · 2 comments

Comments

@wolfmd
Copy link

wolfmd commented Oct 15, 2024

Overview of the Issue

When calling the agent telemetry endpoint /v1/agent/metrics, metrics can be viewed by default in a json format or using the parameter format=prometheus to receive the metrics in prometheus format. By default, all metrics described in the documentation are available in both json and prometheus format.

However, if the prefix_filter option is set, the configuration seems to only apply to the non-Prometheus view of the metrics. Similarly, filter_default does not have any effect on the prometheus view of metrics.

Reproduction Steps

  1. Start an agent with no prefix_filter parameter
consul agent -dev -node localhost -client 127.0.0.1 -hcl 'telemetry { prometheus_retention_time = "10m" }'
  1. Check for a metric such as consul.serf in both outputs
root@mynode:/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/metrics | head
{
    "Timestamp": "2024-10-15 23:02:40 +0000 UTC",
    "Gauges": [
        {
            "Name": "consul.302com1.autopilot.failure_tolerance",
            "Value": 0,
            "Labels": {}
        },
        {
            "Name": "consul.302com1.autopilot.healthy",
root@mynode:/state/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/metrics?format=prometheus | head
# HELP consul_302com1_autopilot_failure_tolerance consul_302com1_autopilot_failure_tolerance
# TYPE consul_302com1_autopilot_failure_tolerance gauge
consul_302com1_autopilot_failure_tolerance 0
# HELP consul_302com1_autopilot_healthy consul_302com1_autopilot_healthy
# TYPE consul_302com1_autopilot_healthy gauge
consul_302com1_autopilot_healthy 1
# HELP consul_302com1_cache_entries_count consul_302com1_cache_entries_count
# TYPE consul_302com1_cache_entries_count gauge
consul_302com1_cache_entries_count 1
  1. Start an agent with a prefix_filter parameter such as removing consul.serf metrics
    consul agent -dev -node localhost -client 127.0.0.1 -hcl 'telemetry { prometheus_retention_time = "10m", filter_default = false, prefix_filter = ["+consul.serf"] }'

  2. Confirm the configuration is in place on the agent

root@mynode:/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/self | jq -r '.DebugConfig.Telemetry'
{
  "AllowedPrefixes": [],
  "BlockedPrefixes": [
    "consul.serf",
    "consul.rpc.server.call"
  ],
...
  "EnableHostMetrics": false,
  "FilterDefault": false,
  "MetricsPrefix": "consul",
  1. Check metrics on both the json and prometheus metrics interface to see that serf metrics are the only ones remaining on the non-prometheus result but prometheus still contains other metrics
root@mynode:/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/metrics | head
{
    "Timestamp": "2024-10-15 23:19:00 +0000 UTC",
    "Gauges": [],
    "Points": [],
    "Counters": [],
    "Samples": [
        {
            "Name": "consul.serf.queue.Event",
            "Count": 1,
            "Rate": 0.1,
root@mynode:/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/metrics?format=prometheus | head
# HELP consul_acl_ResolveToken This measures the time it takes to resolve an ACL token.
# TYPE consul_acl_ResolveToken summary
consul_acl_ResolveToken{quantile="0.5"} NaN
consul_acl_ResolveToken{quantile="0.9"} NaN
consul_acl_ResolveToken{quantile="0.99"} NaN
consul_acl_ResolveToken_sum 0
consul_acl_ResolveToken_count 0
# HELP consul_acl_authmethod_delete
# TYPE consul_acl_authmethod_delete summary
consul_acl_authmethod_delete{quantile="0.5"} NaN

Consul info for both Client and Server

Agent is running consul 1.17.4. This can be reproduced in agent dev mode

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease = dev
	revision = 3e2302b+
	version = 1.17.4
	version_metadata =
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 127.0.0.1:8300
	server = true
raft:
	applied_index = 64
	commit_index = 64
	fsm_pending = 0
	last_contact = 0
	last_log_index = 64
	last_log_term = 2
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:27ef875d-74af-30ff-1c7e-0ed5b987609b Address:127.0.0.1:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 2
runtime:
	arch = amd64
	cpu_count = 96
	goroutines = 186
	max_procs = 96
	os = linux
	version = go1.22.5 X:boringcrypto
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 1
	event_time = 2
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Running on bare metal Debian

@wolfmd
Copy link
Author

wolfmd commented Oct 15, 2024

Also noting that similar behavior is seen when setting prefix_filter = ["-consul"]

@wolfmd
Copy link
Author

wolfmd commented Oct 16, 2024

I'm not sure if I should note this here or in a new issue, but setting the metrics_prefix to anything cuts the number of metrics exported in prometheus format down dramatically

root@mynode:/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/metrics?format=prometheus | grep -v '#' | grep consul | wc -l
599

vs

root@mynode:/home/wolfmd# curl -sS 127.0.0.1:8500/v1/agent/metrics?format=prometheus | grep -v '#' | grep consul | wc -l
63

when metrics_prefix = "" is set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant