You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know it is easy to get it with nvidia-smi. It would be nice that the gpu-feature-discovery exposes it as a label of nodes, so that one doesn't need to ssh into the node.
The text was updated successfully, but these errors were encountered:
@xiongzubiao could you describe how you would want to use these labels? In general the labels are intented to allow selection of specific nodes through node selectors or affinity. Is there a use case that you have which requires you to match nodes by UUID?
It is mainly for metering and diagnosis purpose. We'd like to monitor the usage and the health status of each GPU. Having UUIDs in node label can help us to search data in prometheus.
We don't have a use case to select a particular GPU right now. I guess that could be useful if there are multiple GPUs on a node, but models are not exactly the same?
@elezar Would you be interested if I submit a PR? I figured out that it is not that difficult to expose the UUIDs by leveraging existing functions. The label would look like: nvidia.com/gpu.uuid=GPU-d46f8b5f-76b0-e058-74a8-f82243117fd7,GPU-2871653f-019a-db66-ee74-bbcaece54c8b.
I know it is easy to get it with nvidia-smi. It would be nice that the gpu-feature-discovery exposes it as a label of nodes, so that one doesn't need to ssh into the node.
The text was updated successfully, but these errors were encountered: