Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPMI SDR Cache out of date disables IPMI collectors during update-status hook #202

Closed
przemeklal opened this issue Mar 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@przemeklal
Copy link
Member

przemeklal commented Mar 25, 2024

IPMI collectors worked on a machine before. At some point, they disappeared from the metrics list silently without triggering any alerts (ipmi_sel_command_success):

2024-03-25 14:36:43 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2024-03-25 14:42:20 WARNING unit.hardware-observer/7.update-status logger.go:60 SDR Cache '/root/.freeipmi/sdr-cache/sdr-cache-redacted.localhost' out of date: Please flush the cache and regenerate it
2024-03-25 14:42:20 INFO unit.hardware-observer/7.juju-log server.go:316 IPMI sensors monitoring is not available
2024-03-25 14:42:20 WARNING unit.hardware-observer/7.update-status logger.go:60 SDR Cache '/root/.freeipmi/sdr-cache/sdr-cache-redacted.localhost' out of date: Please flush the cache and regenerate it
2024-03-25 14:42:20 INFO unit.hardware-observer/7.juju-log server.go:316 IPMI SEL monitoring is not available
2024-03-25 14:42:20 INFO unit.hardware-observer/7.juju-log server.go:316 Attempt 1 of /redfish/v1/
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Response Time for GET to /redfish/v1/: 1.5902138333767653 seconds.
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Attempt 1 of /redfish/v1/SessionService/Sessions/
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Response Time for POST to /redfish/v1/SessionService/Sessions/: 0.1784691703505814 seconds.
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Login returned code 400: {"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageId":"iLO.2.14.UnauthorizedLoginAttempt"}]}}
2024-03-25 14:42:22 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
cat /etc/hardware-exporter-config.yaml 
port: 10200
level: INFO

enable_collectors:
  
  - collector.ipmi_dcmi
  
  - collector.redfish
  



redfish_host: "https://redacted"
redfish_username: "redacted"
redfish_password: "redacted"

The issues here are:

  • update-status hook (or any other hook like config-changed) can disable collectors days or weeks after deployment on its own
  • SDR cache being out of date is not handled by the charm
  • it fails silently without any alerts

hardware-observer revision latest/stable 25

@przemeklal przemeklal changed the title IPMI SDR Cache out of date disables collectors during update-status hook IPMI SDR Cache out of date disables IPMI collectors during update-status hook Mar 25, 2024
@Pjack Pjack added the bug Something isn't working label Apr 3, 2024
@Pjack
Copy link

Pjack commented Apr 9, 2024

This specific behavior is addressed by #213 . So we can close this issue.
#214 and #96 will be addressed in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants