-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CDI support to peer pods #2126
Comments
@inatatsu commenting on the webhook aspect. There are few plumbing work pending and I'm open for PRs :-)
|
@bpradipt Thank you for your responses. Can we extend the current webhook-based approach to support DRA and generate a CDI spec in a peer pod VM? |
@zvonkok - you might also be interested and extremely helpful here? |
According to this comment, it looks like CDI needs to be enabled in both runtime and kata-agent. kata-containers/kata-containers#9543 (comment) The CDI support for runtime has been only enabled in runtime-rs, but not in Go version of kata-shim runtime. kata-containers/kata-containers#10145 I don't think runtime-rs supports the remote hypervisor for peer pods. Do we need to enable CDI in the Go version of kata-shim runtime? |
In the Go version of kata-shim runtime, the remote hypervisor just ignore devices for now. I think we also need to fix this when CDI support is enabled in the kata-shim runtime.
|
I think another possible workaround to support CDI in peer pods is to manipulate Devices in CreateContainerRequest by cloud-api-adaptor. cloud-api-adaptor/src/cloud-api-adaptor/pkg/adaptor/proxy/service.go Lines 77 to 82 in aab207c
|
I think @yoheiueda proposal to do it in the CreateContainerRequest may be easier. We can just keep the webhook to handle resource removals from the spec which doesn't apply to peer-pods. |
There are several parts to the story. I am ramping up on peer-pods so excuse my ignorance on some parts. There are several aspects here. Enable CDI in the kata-agent which is completely independent if peer-pods, or local VMM. This is enabled here: kata-containers/kata-containers#9584. @bpradipt This will eliminate the prestart-hook. I do not understand the complete webhook thing in peer-pods, but let's try to keep it simple and stupid. We've build DRA to request special features of a GPU, like give me a GPU with 40G, MIG slice, vGPU or a specific architecture. I am still unsure how we're going to map this exactly with peer-pods since we do not know what the CSP pool is capable of. We need some advertisement system (NFD) for CSP like infrastructure? The peer pods add a new layer of complexity. I need to think of how to enable DRA and CDI. |
@bpradipt We need to think how to enable DRA properly. The logic you have is a good start but ignores MIG, or vGPU. |
Yes, both runtime and kata-agent need integrate with CDI. Currently AFAIK, kata runtime and runtime-rs have both support CDI for GPU scenarios. |
Hmm, since the mapping is Pod per CSP VM we need to make sure that DRA in the case of peer-pods only allows creation of GPUs that map to CSP instance types or have the Pod pending until the CSP implements the proper instance type :) |
All the managment and configuration of devices is now pushed into DRA, whereas with device-plugins you consume what the infrastructure offers. We have a conflict here with peer-pods. In the case of peer-pods DRA would just act as a proxy to pass-through the wanted typed to peer-pods which then in the end would choose the proper instance-type and to the CSP magic. |
@zvonkok Thank you very much for the explanation of how CDI works with DRA.
@Apokleos That sound great! I have a basic question regarding runtime-rs. At some point in the future, will the Go version of kata-shim runtime be deprecated and replaced with runtime-rs? |
Hah, Yeah, good point. I think I should invite AC members @stevenhorsman @fupanli @zvonkok .etc. to help answer this question. |
The short answer here is yes. The more nuanced version is yes, but we are not sure on the timeframe. The current plan is for Kata Containers 4.0 to ship with runtime-rs as the default shim, but the go runtime won't be removed here, however it might have security fixes only, or best-effort feature support with all new features targeted primarily at the rust runtime first. In Kata Containers 5.0 I guess there is a reasonable chance that the go runtime will be removed entirely, but that is unlikely to be decided for a long time. 4.0 is planned for so time in 2025, but there is still quite a bit of work required to close the gap as listed in kata-containers/kata-containers#8702 including the remote hypervisor support that @Apokleos mentioned. |
@bpradipt @stevenhorsman @yoheiueda @zvonkok @Apokleos Thank you very much for your helpful comments. Let me summarize the discussions and suggestions (and my understanding😃). Feel free to correct or add anything:
|
@inatatsu thanks for summarising it.
Is this about advertising external VMs as resources instead of the current per node extended resources?
How CDI is useful for peer-pods case? The availability of the GPU resource is taken care by the cloud infra provider and all GPUs available in the VM gets allocated to the pod as there is 1-1 mapping between VM and pod. |
@bpradipt Thank you for your questions.
While I did not imagine such a use case😅, it is interesting and may simplify the VM management.
In my understanding, CDI allows flexible device mapping and is runtime-agnostic. But as you point out, peer pods primarily rely on selecting an appropriate instance profile (or flavor) to allocate resources, and CDI just provides a mapping between the resources and containers. |
So, CDI will be helpful on the kata-agent side to assign the GPU (or other devices) to the container and additionally using the same building blocks (CDI). Is my understanding correct? |
@bpradipt Yes. That's my current understanding. |
The go runtime merged PRs to enable CDI:
@zvonkok Does this mean the go runtime (except for the remote hypervisor) already supports CDI? |
How can we enable Dynamic Resource Allocation (DRA) based on Container Device Interface (CDI) for peer pods?
K8s v1.26 introduced DRA and Kata agent is recently enabling CDI. In my understanding, when we want to use GPUs in a peer pod, we need to manually specify an instance profile with GPUs. The webhook simply removes
nvidia.com/gpu
device requests to an annotationkata.peerpods.io.gpus
, but it seems to be not used to select an instance profile.Any suggestions?
The text was updated successfully, but these errors were encountered: