Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performancecounter: fails if one or more following counters fail (failed to collect data: performance counter not initialized) #1807

Closed
Nachtfalkeaw opened this issue Dec 6, 2024 · 3 comments
Labels

Comments

@Nachtfalkeaw
Copy link

Current Behavior

windows_exporter 0.32.0-rc-2

If I configure performancecounters, e.g. Memory, Bitlocker and "User Input Delay per Session" and e.g. BitLocker fails then the following "User Input Delay per Session" performancecounter is skipped.

This is my config file:
Szenario A:

  • I get "memory" metrics
  • "User Input delay fails" with error message
    time=2024-12-06T11:03:32.191+01:00 level=DEBUG source=collect.go:213 msg="collector performancecounter failed after 0s, resulting in 1 metrics" err="failed to collect data: performance counter not initialized"
  • BitLocker is skipped
collector:
  performancecounter:
    objects: |
      [
        {
          "object": "Memory",
          "counters": [
            {
              "name": "Cache Faults/sec",
              "type": "counter"
            }
          ]
        },
        {
          "object": "User Input Delay per Session",
          "counters": [
            {
              "name": "Max Session Input Delay (ms)",
              "type": "counter"
            }
          ]
        },
        {
          "object": "BitLocker",
          "instances": [
            "*"
          ],
          "instance_label": "Laufwerk",
          "counters": [
            {
              "name": "Write Requests/sec",
              "type": "counter"
            },
            {
              "name": "Write Subrequests/sec",
              "type": "counter"
            },
            {
              "name": "Read Requests/sec",
              "type": "counter"
            },
            {
              "name": "Read Subrequests/sec",
              "type": "counter"
            }
          ]
        }
      ]

Scenario B:

  • same counter as above but different order.
  • Memory and Bitlocker generate metrics, "User Input Delay per Session" is failing.
collector:
  performancecounter:
    objects: |
      [
        {
          "object": "Memory",
          "counters": [
            {
              "name": "Cache Faults/sec",
              "type": "counter"
            }
          ]
        },
        {
          "object": "BitLocker",
          "instances": [
            "*"
          ],
          "instance_label": "Laufwerk",
          "counters": [
            {
              "name": "Write Requests/sec",
              "type": "counter"
            },
            {
              "name": "Write Subrequests/sec",
              "type": "counter"
            },
            {
              "name": "Read Requests/sec",
              "type": "counter"
            },
            {
              "name": "Read Subrequests/sec",
              "type": "counter"
            }
          ]
        },
        {
          "object": "User Input Delay per Session",
          "counters": [
            {
              "name": "Max Session Input Delay (ms)",
              "type": "counter"
            }
          ]
        }
      ]

Expected Behavior

  1. The failing counter should be skipped and other working counters should collect metrics. After the failing one the other counters should not stop working. A proper error message should be generated (look at 2.)

  2. The error message is only visible if "DEBUG" log level is set. From the message itself it looks like it should be level=error. If possible the failing counter Name (object) should be listet to indicate which one failed.
    time=2024-12-06T11:03:32.191+01:00 level=DEBUG source=collect.go:213 msg="collector performancecounter failed after 0s, resulting in 1 metrics" err="failed to collect data: performance counter not initialized"

  3. The collector_success metric is "0" if one performancecounter failed (Memory, Bitlocker worked; the other failed). This may be misleading if we only skip the failing one and proceed all others as suggested in (1.). So we need to take care that this metric represents the correct information.

# HELP windows_exporter_collector_success windows_exporter: Whether the collector was successful.
# TYPE windows_exporter_collector_success gauge
windows_exporter_collector_success{collector="performancecounter"} 0

As performance counters are custom and some can work and others not it maybe it would be and idea to add a separate label for each of the "objects" defined in performance counter.

e.g.
windows_exporter_collector_success{collector="performancecounter",object="Memory"} 1
windows_exporter_collector_success{collector="performancecounter",object="BitLocker"} 1
windows_exporter_collector_success{collector="performancecounter",object="User Input Delay per Session"} 0

This would result in additional metrics like "timeout" and "duration" for each of these "objects".
Maybe would help to identify which counter failed and which other counters are heavy and need long time to be processed.

CPU and memory collectors have individual metrics for success and duration.
different performancecounters could be treated the same way.

Steps To Reproduce

user the provided configs and change the order of working and non working performancecounters

Environment

  • windows_exporter Version: 0.30.0-rc-2
  • Windows Server Version: Windows 10

windows_exporter logs

logs provided above.

Anything else?

No response

@Nachtfalkeaw Nachtfalkeaw changed the title Performancecounter: ithe following performancecounter are skipped (failed to collect data: performance counter not initialized) Performancecounter: fails if one or more following counters fail (failed to collect data: performance counter not initialized) Dec 6, 2024
@jkroepke
Copy link
Member

jkroepke commented Dec 6, 2024

Exclude separate counter is not possible by design. However, it possible to ignore failed object block definition (once integrated). In that case, you have to declare separate blocks for optional counters. Should be an issue to declare Bitlocker multiple times.

[{
          "object": "BitLocker",
          "instances": [
            "*"
          ],
          "instance_label": "Laufwerk",
          "counters": [
            {
              "name": "Write Requests/sec",
              "type": "counter"
            }
          ]
},{
          "object": "BitLocker",
          "instances": [
            "*"
          ],
          "instance_label": "Laufwerk",
          "counters": [
            {
              "name": "Write Subrequests/sec",
              "type": "counter"
            }
          ]
}]

As performance counters are custom and some can work and others not it maybe it would be and idea to add a separate label for each of the "objects" defined in performance counter.

I will add an new property named "name" that end-users can define as they would. Since objects can be defined multiple times, I need some more unique property.

@jkroepke
Copy link
Member

jkroepke commented Dec 6, 2024

The error message is only visible if "DEBUG" log level is set. From the message itself it looks like it should be level=error.

That is the trade-off from #1748 where the exporter should not log spam. The error are logged as error on application start once.

But based and your feedback and feedback from users, I will raise it again to warn. That
might result into log spam, but thats expected than.

@jkroepke
Copy link
Member

jkroepke commented Dec 9, 2024

Current findings are implemented and solved via #1809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants