Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpu uuids to node lables #1015

Open
xiongzubiao opened this issue Oct 26, 2024 · 3 comments
Open

Add gpu uuids to node lables #1015

xiongzubiao opened this issue Oct 26, 2024 · 3 comments

Comments

@xiongzubiao
Copy link

I know it is easy to get it with nvidia-smi. It would be nice that the gpu-feature-discovery exposes it as a label of nodes, so that one doesn't need to ssh into the node.

@elezar
Copy link
Member

elezar commented Oct 31, 2024

@xiongzubiao could you describe how you would want to use these labels? In general the labels are intented to allow selection of specific nodes through node selectors or affinity. Is there a use case that you have which requires you to match nodes by UUID?

@xiongzubiao
Copy link
Author

It is mainly for metering and diagnosis purpose. We'd like to monitor the usage and the health status of each GPU. Having UUIDs in node label can help us to search data in prometheus.

We don't have a use case to select a particular GPU right now. I guess that could be useful if there are multiple GPUs on a node, but models are not exactly the same?

@xiongzubiao
Copy link
Author

@elezar Would you be interested if I submit a PR? I figured out that it is not that difficult to expose the UUIDs by leveraging existing functions. The label would look like: nvidia.com/gpu.uuid=GPU-d46f8b5f-76b0-e058-74a8-f82243117fd7,GPU-2871653f-019a-db66-ee74-bbcaece54c8b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants