-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFD] Support for Power Management #64
Comments
This is great! I have a few comments.
|
Absolutely, I was just trying to keep the scope of this RFD manageable, but I think you make a good point.
Agreed, the best option would be to use goroutines. The first step will be to check if they can support the load, if they can we are done.
Yes, I saw that you had started to add some CLIs. Having a single command with subcommands/modules is a good approach. I am a fan of Typer and have used it to build CLIs like that before. Would be happy to collaborate on RFD for that. |
Support for Power Management
This RFD lays out a path to adding power management to OpenCHAMI.
What needs to change:
Currently OpenCHAMI doesn't have a way to perform basic power management operations such as powering on a set of nodes. A well integrated power management service is an important component of a system management platform.
What do you propose?
The proposal is to bring the existing Cray-HPE Power Control Service into the OpenCHAMI project.
Starting from an existing code base that integrates with SMD seems like a more pragmatic approach, rather than starting from scratch. It also has the advantage for those that have existing Cray-HPE hardware that can reuse integrations with the existing PCS API. In general, it seems that the PCS API is pretty functional, and many of the issues discussed below are the result of the implementation of the command line tools that use the PCS API, rather than deficiencies in the service itself.
Inline with the transition of SMD to the OpenCHAMI project. The following set of changes would performed initially:
PCS and its tooling do have some pain points that will serve as a bug/feature list for future development.
Here are a few of the top issues raised by NERSC staff:
Quite frequently, the API reports success ( HTTP 200 ) but there is an error talking to redfish and the underlying failure is not propagated back to the operator. In the case of SLURM, daemon retry logic has been added to try to overcome this flakiness. Sometimes operators have to call the redfish interface directly, but this is rare.
PCS is 'imperative' (go do this action) by design rather than 'declarative' (maintain this state). So it's unlikely that we would add this sort of retry logic to PCS. However, ensuring that errors are correctly propagated to the API would allow other tools to be built with a more declarative view of the system.
When interacting with BMCs in any form (via Redfish or IPMI or whatever) they don't always listen to you the first time. Some implementations of power control will re-send the same request several times and ask the BMC what it thinks happened.
This is somewhat similar to the previous point and is probably out of scope in terms of PCS. However, PCS needs to provide accurate information in terms of how the BMCs respond to requests.
The PCS’s view of what can be capped is incorrect sometimes. Do we have more details on this?
The following fit more into the category of feature requests:
Progress tracking
cray power transition describe
which you have to scroll through is not very useful for such high level monitoring.The presentation of progress is probably out of scope of PCS. However, providing an event stream associated with transition would allow us to write more useful tools that could provide this sort of progress information without the need to resort to polling. One possible approach would be to add SSE or websockets to the API to allow a client to subscribe to specific events.
Retry logic on server side
This is probably outside the scope of PCS as it was designed. However, it could be implemented by a service built on top of PCS.
Queuing of transitions
More investigation needs to be performed to understand how queue / serialization of transition could be performed.
Longer terms goals
Transition to a cell-based architecture
In line with wider discussions ( #41 ) across the collaboration we should look at how we could transition away from a single PCS instance, to multiple independent instances, for example one instance per cabinet, thus reduce the size of the failure domain. Given the imperative nature of PCS it should be amenable to a cellular deployment.
Transition away from TRS
PCS currently uses HMS Task Runner Service (TRS) to parallelize operations, for example sending requests to BMCs. It uses Kafka to queue tasks that can then be processed by workers. TRS doesn't seem to be under active development. Given this it would be a good idea to move to a community supported alternative, of which there are many. Here are just a few:
An analysis will need to be performed to select an alternative that matches the needs of PCS. Moving away from TRS would allow us to leverage the features of a modern task queue and reduce the maintenance burden of having to maintain TRS along with PCS. TRS also has a "local" mode that uses goroutines, this may be enough to support the requests generated by PCS and would reduce the amount of TRS code that would need to be maintained.
Looks at moving to PostgreSQL for state storage
OpenCHAMI's SMD implementation transitioned away from etcd as its persistent backend store as etcd was a big contributor to unplanned outages at LANL. This has not been our experience with PCS at NERSC. However, looking at the implementation of the storage provider for PCS, it does look like it would be amenable to a relational implementation if this was necessary. Another approach that might be worth considering is to use node-local storage with snapshotting like the experiments that have been implemented in (https://github.com/OpenCHAMI/quack). This might fit nicely given that the power control state can be regenerated relatively easily.
Operator facing tools
PCS provides an API that can be used to build operator facing tools needed to perform power transitions.
cray power
is one of the current clients of PCS. Many of the issues/feature requests raised would be implemented in client tools. The intent would be to implement a new command line interface to PCS that addresses these needs. Another RFD would be submitted to provide a detailed discussion of such a tool.What alternatives exist?
Decide that power management is out of scope for OpenCHAMI and recommend integration with other tools a such as:
The downside of using these external tools is they lack the integration with SMD, for example creating a reservation for nodes that are being shutdown.
Start from scratch and implement a new microservice from the ground up. This would avoid carrying any technical debt from PCS, however, it would involve significant development effort.
The text was updated successfully, but these errors were encountered: