Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale prometheus metrics for removed machines #3433

Open
rekup opened this issue Jan 29, 2025 · 1 comment
Open

Stale prometheus metrics for removed machines #3433

rekup opened this issue Jan 29, 2025 · 1 comment
Labels

Comments

@rekup
Copy link

rekup commented Jan 29, 2025

What happened?

We use the following promql expression to monitor the heartbeat of our crowdsec machines:

increase(cs_lapi_machine_requests_total{route="/v1/heartbeat"}[15m]) == 0

If a machine gets removed, the respective metric of this machine continues to be exposed in the metrics endpoint of crowdsec. This then results in an alert regarding a machine which is not present anymore and also does not show up in the output of cscli machine list.

What did you expect to happen?

I expected that when I remove a machine using cscli machine delete the metrics endpoint does not expose any metrics for the removed machine anymore.

How can we reproduce it (as minimally and precisely as possible)?

  1. Enable the prometheus metrics endpoint
  2. Add a machine and wait for the first heartbeat
  3. Check the metrics (e.g. by curl $(hostname -f):6060/metrics|grep cs_lapi_machine_requests_total) and make sure there is a metric for the newly added machine.
  4. Delete the machine using cscli machine delete
  5. Check the metrics again

Anything else we need to know?

The issue also seems to affect bouncer metrics (such as cs_lapi_bouncer_requests_total).

Crowdsec version

version: v1.6.4-rpm-pragmatic-amd64-fb733ee4
Codename: alphaga
BuildDate: 2024-11-20_13:32:35
GoVersion: 1.23.3
Platform: linux
libre2: C++
User-Agent: crowdsec/v1.6.4-rpm-pragmatic-amd64-fb733ee4-linux
Constraint_parser: >= 1.0, <= 3.0
Constraint_scenario: >= 1.0, <= 3.0
Constraint_api: v1
Constraint_acquis: >= 1.0, < 2.0
Built-in optional components: cscli_setup, datasource_appsec, datasource_cloudwatch, datasource_docker, datasource_file, datasource_http, datasource_journalctl, datasource_k8s-audit, datasource_kafka, datasource_kinesis, datasource_loki, datasource_s3, datasource_syslog, datasource_wineventlog

OS version

NAME="Rocky Linux"
VERSION="8.10 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.10 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.10"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.10"


Linux rproxy01.example.org 4.18.0-553.34.1.el8_10.x86_64 #1 SMP Wed Jan 8 14:44:18 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Enabled collections and parsers

$ cscli hub list -o raw
# paste output here

Acquisition config

```console # On Linux: $ cat /etc/crowdsec/acquis.yaml /etc/crowdsec/acquis.d/* # paste output here

On Windows:

C:> Get-Content C:\ProgramData\CrowdSec\config\acquis.yaml

paste output here

Config show

Global:
   - Configuration Folder   : /etc/crowdsec
   - Data Folder            : /var/lib/crowdsec/data
   - Hub Folder             : /etc/crowdsec/hub
   - Simulation File        : /etc/crowdsec/simulation.yaml
   - Log Folder             : /var/log
   - Log level              : info
   - Log Media              : file
Crowdsec:
  - Acquisition File        : /etc/crowdsec/acquis.yaml
  - Parsers routines        : 1
  - Acquisition Folder      : /etc/crowdsec/acquis.d
cscli:
  - Output                  : human
  - Hub Branch              : 
API Client:
  - URL                     : http://192.168.10.24:8080/
  - Login                   : rproxy01.example.org
  - Credentials File        : /etc/crowdsec/local_api_credentials.yaml
Local API Server:
  - Listen URL              : 192.168.10.24:8080
  - Listen Socket           : 
  - Profile File            : /etc/crowdsec/profiles.yaml

  - Trusted IPs:
      - 127.0.0.1
      - ::1
  - Database:
      - Type                : sqlite
      - Path                : /var/lib/crowdsec/data/crowdsec.db
      - Flush age           : 7d
      - Flush size          : 5000

Prometheus metrics

Local API Machines Metrics:
╭────────────────────────────────┬───────────────┬────────┬──────╮
│ Machine                        │ Route         │ Method │ Hits │
├────────────────────────────────┼───────────────┼────────┼──────┤
│ deleted-machine.example.org     │ /v1/heartbeat │ GET    │ 2590 │
│ rproxy01.example.org        │ /v1/heartbeat │ GET    │ 2734 │
│ rproxy01.example.org        │ /v1/alerts    │ POST   │ 79   │

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

@rekup rekup added the kind/bug Something isn't working label Jan 29, 2025
Copy link

@rekup: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant