-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xpk Cluster Queue resource group "cpu" resource quota incorrect for a CPU-only cluster #158
Comments
@RoshaniN curious if you can confirm my understanding? Does nominal quota here represent the number of vms/GKE nodes which in this case = 1024 or should it equal the number of CPU (chips/cores?)? From my understanding, Kueue would want the number of vms/ GKE nodes (1024) value in order to schedule workloads right? Or should Kueue be getting the 1024 *32 value? |
Per https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/, I would think it's the CPU resources with that resource group so it should roughly 1024*32. It seeks that XPK assigns it here. It is probably true when the resource type is tpu or gpu where the number of chips represent the resources but for CPU it is different? |
I believe this implementation for CPUs is working as intended. Based on the examples mentioned in https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#resources , for CPUs, this amounts to number of CPUs (or number of VMs). There is no correlation between the type of the CPUs (n2-standard-32 (32 vCPUs) or e2-standard-4 (4 vCPUs)) and the nominal quota. TPUs and GPUs have physical chips and the resources can be more granularly partitioned, if required. Is there a problem that we are seeing or is the kueue accepting and queueing CPU requests as intended? |
Shouldn't "number of CPUs" equal to the number of VMs * the CPUs of each VM?
fails to get accepted which it should, because each pod is asking for 20 CPU resources and can fit on a single node. |
I found this - https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu Edit: refreshed this issue to see that Bernard also posted this link :) Requests and limits (mentioned in the workload) should then also follow this notation. |
yep, thanks @RoshaniN ! So it should be equal to number of VMs * the virtual CPUs of each CPU machine in the nominalquota. I guess I was also sure about it because submitting the job with
through the queue fails but without the queue it schedules successfully ;) |
Thanks @bernardhan33 for checking that it fails with cpu: 20000m, could you check if this passes/fails with cpu: 20 ? Trying to understand if the notions / quantities are different. |
yeah 20 fails too. 20000m == 20 so it will be evaluated to equivalent resource. |
btw I'm not blocked on this -- I can |
Yes, wanted to be sure of that. Thanks @bernardhan33 |
We have a CPU-only cluster -- n2-standard-32-1024 that has 1024 of n2-standard-32 nodes. There, we technically should have a rough 1024 * 32 CPU resources but I'm seeing 1024 nominated quota from
kubectl describe clusterqueue cluster-queue
:The text was updated successfully, but these errors were encountered: