You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm digging into new capacity plugin and following this tutorial capacity plugin user guide.
I'm using latest volcano release (1.11.0) but for some reason my setup does no reclaim from overcommited queue.
Steps to reproduce the issue
I've created following queues:
apiVersion: scheduling.volcano.sh/v1beta1kind: Queuemetadata:
name: queue1spec:
reclaimable: truedeserved: # set the deserved field.nvidia.com/gpu: 16
---
apiVersion: scheduling.volcano.sh/v1beta1kind: Queuemetadata:
name: queue2spec:
reclaimable: truedeserved: # set the deserved field.nvidia.com/gpu: 24
This is volcano-scheduler-configmap (after editing it, I've restarted all the volcano system pods - scheduler, admission, controllers and have seen that capacity plugin is enabled in scheduler logs.
At least 2 pods from queue2 must be scheduled - 1 pod on free 8GPU node and 1 must reclaim resource from overcommited queue1 (as queue1 has only 16 deserved GPUs)
What version of Volcano are you using?
1.11.0
Any other relevant information
No response
The text was updated successfully, but these errors were encountered:
Seems that in v1.11,reclaim only happen when job is starving, a job is starving when non-pending pods nums < minavailiable, because the deployment's default minavailiable=1, and there is already a running pod in demo-2, so demo-2 is not starving, hence reclaim won't happen.
Correct, this is because vc-controller creates a default MinAvailable=1 podgroup for deployment, currently we don't have API to specify this MinAvailable for non vc-job workloads, we already have a feature issue to track it: #3970. If you really need this feature urgently (based on v1.11), we can release it as a patch later.
Description
Hi, I'm digging into new capacity plugin and following this tutorial capacity plugin user guide.
I'm using latest volcano release (1.11.0) but for some reason my setup does no reclaim from overcommited queue.
Steps to reproduce the issue
I've created following queues:
This is volcano-scheduler-configmap (after editing it, I've restarted all the volcano system pods - scheduler, admission, controllers and have seen that capacity plugin is enabled in scheduler logs.
Then, I'm creating deployment1 that would create 3 pods each requiring 8GPUs.
My cluster has 4 free 8GPU nodes (and also 4 busy ones, scheduled without volcano), so all of those has been scheduled
After that, I've created another deployment, also with 3 pods 8GPUs each:
But for some reason it does not reclaim its resources from queue1
Describe the results you received and expected
At least 2 pods from queue2 must be scheduled - 1 pod on free 8GPU node and 1 must reclaim resource from overcommited queue1 (as queue1 has only 16 deserved GPUs)
What version of Volcano are you using?
1.11.0
Any other relevant information
No response
The text was updated successfully, but these errors were encountered: