Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardware-exporter crashes after connecting to redfish on Huawei 2288H V5 #108

Closed
przemeklal opened this issue Nov 16, 2023 · 4 comments · Fixed by canonical/prometheus-hardware-exporter#49
Assignees
Labels
bug Something isn't working
Milestone

Comments

@przemeklal
Copy link
Member

przemeklal commented Nov 16, 2023

Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Login returned code 201: {"@odata.context":"/redfish/v1/$metadata#Session.Session","@odata.id":"/redfish/v1/SessionService/Sessions/df55de7604062a10","@odata.type":"#Session.v1_0_2.Session","Id":"df55de7604062a10","Name":"User Session","Oem":{"Huawei":{"UserAccount":"Canonical","LoginTime":"2023-11-16T16:14:16+00:00","UserId":5,"UserValidDays":null,"AccountInsecurePromptEnabled":false,"UserIP":"redacted","UserTag":"Redfish","MySession":true,"UserRole":["Administrator"]}}}
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Getting redfish sensor info...
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Attempt 1 of /redfish/v1/
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Response Time for GET to /redfish/v1/: 0.09386513195931911 seconds.
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Attempt 1 of /redfish/v1/Chassis
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Response Time for GET to /redfish/v1/Chassis: 0.05613565444946289 seconds.
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Attempt 1 of /redfish/v1/Chassis/1
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Response Time for GET to /redfish/v1/Chassis/1: 0.2747149337083101 seconds.
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Attempt 1 of /redfish/v1/Chassis/1/Power
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Response Time for GET to /redfish/v1/Chassis/1/Power: 0.12008355371654034 seconds.
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Attempt 1 of /redfish/v1/Chassis/1/Thermal
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Response Time for GET to /redfish/v1/Chassis/1/Thermal: 0.18710010312497616 seconds.
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Getting processor data...
Nov 16 16:14:17 redacted python3[583891]: 2023-11-16 16:14:17 INFO Attempt 1 of /redfish/v1/
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/: 0.03650726191699505 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/Systems
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/Systems: 0.05554169788956642 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/Systems/1/Processors
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/Systems/1/Processors: 0.07191604748368263 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/Systems/1/Processors/1
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/Systems/1/Processors/1: 0.05324621684849262 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/Systems/1/Processors/2
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/Systems/1/Processors/2: 0.05895916186273098 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Getting storage controller data...
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/: 0.061779825016856194 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/Systems
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/Systems: 0.014882449060678482 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/Systems/1/Storage
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for GET to /redfish/v1/Systems/1/Storage: 0.040700653567910194 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Attempt 1 of /redfish/v1/SessionService/Sessions/df55de7604062a10
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO Response Time for DELETE to /redfish/v1/SessionService/Sessions/df55de7604062a10: 0.06376663781702518 seconds.
Nov 16 16:14:18 redacted python3[583891]: 2023-11-16 16:14:18 INFO User logged out: {"error":{"code":"Base.1.0.GeneralError","message":"A general error has occurred. See ExtendedInfo for more information.","@Message.ExtendedInfo":[{"@odata.type":"/redfish/v1/$metadata#MessageRegistry.1.0.0.MessageRegistry","MessageId":"Base.1.0.Success","RelatedProperties":[],"Message":"Successfully Completed Request","MessageArgs":[],"Severity":"OK","Resolution":"None"}]}}
Nov 16 16:14:18 redacted python3[583891]: Traceback (most recent call last):
Nov 16 16:14:18 redacted python3[583891]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Nov 16 16:14:18 redacted python3[583891]:     return _run_code(code, main_globals, None,
Nov 16 16:14:18 redacted python3[583891]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
Nov 16 16:14:18 redacted python3[583891]:     exec(code, run_globals)
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/__main__.py", line 206, in <module>
Nov 16 16:14:18 redacted python3[583891]:     main()
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/__main__.py", line 202, in main
Nov 16 16:14:18 redacted python3[583891]:     start_exporter(exporter_config)
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/__main__.py", line 174, in start_exporter
Nov 16 16:14:18 redacted python3[583891]:     exporter.register(collector)
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/exporter.py", line 45, in register
Nov 16 16:14:18 redacted python3[583891]:     REGISTRY.register(collector)
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_client/registry.py", line 40, in register
Nov 16 16:14:18 redacted python3[583891]:     names = self._get_names(collector)
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_client/registry.py", line 80, in _get_names
Nov 16 16:14:18 redacted python3[583891]:     for metric in desc_func():
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/core.py", line 114, in collect
Nov 16 16:14:18 redacted python3[583891]:     payloads = self.fetch()
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/collector.py", line 972, in fetch
Nov 16 16:14:18 redacted python3[583891]:     ) = redfish_helper.get_storage_controller_data()
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/prometheus_hardware_exporter/collectors/redfish.py", line 256, in get_storage_controller_data
Nov 16 16:14:18 redacted python3[583891]:     storage_ids: List[str] = redfish_utilities.collections.get_collection_ids(
Nov 16 16:14:18 redacted python3[583891]:   File "/var/lib/juju/agents/unit-hardware-observer-10/charm/venv/redfish_utilities/collections.py", line 45, in get_collection_ids
Nov 16 16:14:18 redacted python3[583891]:     raise RedfishCollectionNotFoundError( "Service does not contain a collection at URI {}".format( collection_uri ) )
Nov 16 16:14:18 redacted python3[583891]: redfish_utilities.collections.RedfishCollectionNotFoundError: Service does not contain a collection at URI /redfish/v1/Systems/1/Storage
Nov 16 16:14:18 redacted systemd[1]: hardware-exporter.service: Main process exited, code=exited, status=1/FAILURE
Nov 16 16:14:18 redacted systemd[1]: hardware-exporter.service: Failed with result 'exit-code'.
Nov 16 16:14:18 redacted systemd[1]: hardware-exporter.service: Scheduled restart job, restart counter is at 38.
Nov 16 16:14:18 redacted systemd[1]: Stopped HTTP service for prometheus hardware exporter..
Nov 16 16:14:18 redacted systemd[1]: Started HTTP service for prometheus hardware exporter..

As a result, all other exporters (ipmi for example) crash and result in zero metrics and zero alerts.

@przemeklal
Copy link
Member Author

przemeklal commented Nov 16, 2023

Please note that the login to redfish was successful (HTTP 201):

Nov 16 16:14:17 brtlvmrs0807co python3[583891]: 2023-11-16 16:14:17 INFO Login returned code 201: {"@odata.context":"/redfish/v1/$metadata#Session.Session","@odata.id":"/redfish/v1/SessionService/Sessions/df55de7604062a10","@odata.type":"#Session.v1_0_2.Session","Id":"df55de7604062a10","Name":"User Session","Oem":{"Huawei":{"UserAccount":"Canonical","LoginTime":"2023-11-16T16:14:16+00:00","UserId":5,"UserValidDays":null,"AccountInsecurePromptEnabled":false,"UserIP":"redacted","UserTag":"Redfish","MySession":true,"UserRole":["Administrator"]}}}

@przemeklal przemeklal changed the title hardware-exporter crashes after connecting to redfish hardware-exporter crashes after connecting to redfish on Huawei 2288H V5 Nov 16, 2023
@przemeklal
Copy link
Member Author

This might be a duplicate of #91

@jneo8
Copy link
Contributor

jneo8 commented Nov 16, 2023

Yes, I believe this is a duplicate one. @dashmage are already working on this, waiting for his PR.

@Pjack Pjack added the bug Something isn't working label Nov 16, 2023
dashmage added a commit to canonical/prometheus-hardware-exporter that referenced this issue Nov 23, 2023
* fix(redfish): Make redfish storage name in URI dynamic.

The redfish storage name in the URI was initially hardcoded to "Storage".
On some servers which did not conform to the schema specification, this
name was provided differently, eg: Storages.

This change aims to find the storage uri name dynamically while fetching
the storage controller and storage drive data.

Also:
* Add new unit test + fix old ones.
* Remove unit test number prefixes.
* Remove unnecessary assertions from unit tests.

Fixes canonical/hardware-observer-operator#91, canonical/hardware-observer-operator#108
@dashmage
Copy link
Contributor

Fixed in this commit on prometheus-hardware-exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants