Fix the conflict between preemption and antiAffinity #3070

wangyang0616 · 2023-08-26T09:09:45Z

fix: #3068
Partial optimization based on #3051, Currently, preemption only considers extended resources such as cpu, memory, and gpu, and does not consider strategies such as affinity, topology, etc. When the predicate inherits the k8s filtering algorithm, if the return value is not Success, it returns an error, neither allcate nor preempt

Test Results：
The high-priority job anti-test preempts the resources of the low-priority job label-sh and runs successfully.

Pending pod event:

Pending pod msg:

lowang-bh · 2023-08-26T09:24:36Z

pkg/scheduler/plugins/predicates/predicates.go

@@ -411,45 +411,35 @@ func (pp *predicatesPlugin) OnSessionOpen(ssn *framework.Session) {
 				Reason: api.NodePodNumberExceeded,
 			}
 			predicateStatus = append(predicateStatus, podsNumStatus)
-			return predicateStatus, nil


If we don't return, please filter out those reason is null string at

volcano/pkg/scheduler/util/predicate_helper.go

Lines 156 to 158 in c91eb07

for _, status := range s {

all = append(all, status.Reason)

}

lowang-bh · 2023-08-26T09:29:20Z

pkg/scheduler/plugins/predicates/predicates.go

 			if nodeUnscheduleStatus.Code != api.Success {
 				predicateStatus = append(predicateStatus, nodeUnscheduleStatus)
-				return predicateStatus, false, nil
+				return predicateStatus, false, fmt.Errorf("plugin %s predicates failed %s", nodeUnscheduleFilter.Name(), status.Message())


Note that if error is return, those code will not hit:

volcano/pkg/scheduler/actions/allocate/allocate.go

Lines 110 to 113 in c91eb07

if statusSets.ContainsUnschedulable() || statusSets.ContainsUnschedulableAndUnresolvable() ||

statusSets.ContainsErrorSkipOrWait() {

return nil, api.NewFitError(task, node, statusSets.Message())

}

volcano/pkg/scheduler/actions/preempt/preempt.go

Lines 220 to 223 in c91eb07

if statusSets.ContainsUnschedulableAndUnresolvable() || statusSets.ContainsErrorSkipOrWait() {

return nil, fmt.Errorf("predicates failed in preempt for task <%s/%s> on node <%s>, status is not success or unschedulable",

task.Namespace, task.Name, node.Name)

}

The filtering of the pod number in the predicate plugin will go to this part of the code logic

The gpu-related resource filtering in device FilterNode will go to this part of the code logic

For strategies such as nodeaffinity, podaffinity, and nodeport, an error is returned directly, and the preemption action is not currently supported

lowang-bh · 2023-08-26T09:31:20Z

pkg/scheduler/plugins/predicates/predicates.go

 				if nodeAffinityStatus.Code != api.Success {
 					predicateStatus = append(predicateStatus, nodeAffinityStatus)
-					return predicateStatus, false, nil
+					return predicateStatus, false, fmt.Errorf("plugin %s predicates failed %s", nodeAffinityFilter.Name(), status.Message())


same as before

lowang-bh · 2023-08-26T09:32:23Z

pkg/scheduler/plugins/predicates/predicates.go

 			if nodePortStatus.Code != api.Success {
 				predicateStatus = append(predicateStatus, nodePortStatus)
-				return predicateStatus, nil
+				return predicateStatus, fmt.Errorf("plugin %s predicates failed %s", nodePortFilter.Name(), status.Message())


same as before.

lowang-bh · 2023-08-26T09:39:23Z

Thanks for your hard work. BTW, Is there any test result about this PR, not only including preemtation works well, but also the pod/podgroup unscheduling message in their status.

lowang-bh · 2023-08-26T10:09:50Z

And there is another enhancement can be used in preemption action: #3071

What's more, I think the current preemption action can not work well. In my opinion, it should following those steps:

filter out those nodes which can not help preemption, according to the filter result stored in job's NodesFitErrors
range from those nodes in step-1, consider the victims from a node and remove the victims from this node, and then re-run predications on the node; if result is success, then the node is a candidate node, put them in final node list;
score on thoes candinate nodes, to choose a best one.

wangyang0616 · 2023-08-26T10:33:43Z

Thanks for your hard work. BTW, Is there any test result about this PR, not only including preemtation works well, but also the pod/podgroup unscheduling message in their status.

I will add the test results later.

…t does not support preemption by strategies such as antiAffinity and topologyspread Signed-off-by: wangyang <[email protected]>

wangyang0616 · 2023-08-27T13:29:54Z

And there is another enhancement can be used in preemption action: #3071

What's more, I think the current preemption action can not work well. In my opinion, it should following those steps:

filter out those nodes which can not help preemption, according to the filter result stored in job's NodesFitErrors

range from those nodes in step-1, consider the victims from a node and remove the victims from this node, and then re-run predications on the node; if result is success, then the node is a candidate node, put them in final node list;

score on thoes candinate nodes, to choose a best one.

I very much agree with your proposal, the preemption function still has a lot to optimize, we can iterate gradually.

william-wang · 2023-08-28T01:53:34Z

pkg/scheduler/actions/reclaim/reclaim.go

@@ -126,15 +126,15 @@ func (ra *Action) Execute(ssn *framework.Session) {
 			var statusSets util.StatusSets
 			statusSets, err := ssn.PredicateFn(task, n)
 			if err != nil {
-				klog.V(3).Infof("reclaim predicates failed for task <%s/%s> on node <%s>: %v",
+				klog.V(5).Infof("reclaim predicates failed for task <%s/%s> on node <%s>: %v",


What's the reason of change level 3 to level 5？

When a large number of pending pods in the cluster perform the reclaim operation and resource reclamation fails, a large number of repeated logs are generated in each round of scheduling. I think it is more appropriate to change the log level to debug.

william-wang

/lgtm

volcano-sh-bot · 2023-08-28T02:18:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [william-wang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lowang-bh · 2023-08-28T02:43:39Z

What's more, I think the current preemption action can not work well. In my opinion, it should following those steps:

filter out those nodes which can not help preemption, according to the filter result stored in job's NodesFitErrors

range from those nodes in step-1, consider the victims from a node and remove the victims from this node, and then re-run predications on the node; if result is success, then the node is a candidate node, put them in final node list;

score on thoes candinate nodes, to choose a best one.

I have open an enhancement requirement to trace it. enhancement for preemption action #3074

volcano-sh-bot requested review from hudson741 and merryzhou August 26, 2023 09:09

volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 26, 2023

lowang-bh reviewed Aug 26, 2023

View reviewed changes

wangyang0616 force-pushed the fix_preempt_antiaffinity branch 3 times, most recently from 14d1a8d to 5d6cc2d Compare August 27, 2023 13:19

Volcano supports resource preemption such as cpu, memory, and gpu, bu…

63bc96c

…t does not support preemption by strategies such as antiAffinity and topologyspread Signed-off-by: wangyang <[email protected]>

wangyang0616 force-pushed the fix_preempt_antiaffinity branch from 5d6cc2d to 63bc96c Compare August 27, 2023 13:27

william-wang reviewed Aug 28, 2023

View reviewed changes

william-wang approved these changes Aug 28, 2023

View reviewed changes

volcano-sh-bot assigned william-wang Aug 28, 2023

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 28, 2023

volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2023

volcano-sh-bot merged commit 92df2af into volcano-sh:master Aug 28, 2023
12 checks passed

wangyang0616 mentioned this pull request Aug 28, 2023

[cherry-pick for release-1.8] msg information optimization; preemption logic optimization #3082

Merged

lowang-bh mentioned this pull request Aug 28, 2023

[good first issue]add relative unit testcase for merged PRs #3075

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the conflict between preemption and antiAffinity #3070

Fix the conflict between preemption and antiAffinity #3070

wangyang0616 commented Aug 26, 2023 •

edited

Loading

lowang-bh Aug 26, 2023

wangyang0616 Aug 27, 2023

lowang-bh Aug 26, 2023

wangyang0616 Aug 26, 2023

lowang-bh Aug 26, 2023

wangyang0616 Aug 27, 2023

lowang-bh Aug 26, 2023

wangyang0616 Aug 27, 2023

lowang-bh commented Aug 26, 2023

lowang-bh commented Aug 26, 2023 •

edited

Loading

wangyang0616 commented Aug 26, 2023

wangyang0616 commented Aug 27, 2023

william-wang Aug 28, 2023

wangyang0616 Aug 28, 2023

william-wang Aug 28, 2023

william-wang left a comment

volcano-sh-bot commented Aug 28, 2023

lowang-bh commented Aug 28, 2023

	if statusSets.ContainsUnschedulable() \|\| statusSets.ContainsUnschedulableAndUnresolvable() \|\|
	statusSets.ContainsErrorSkipOrWait() {
	return nil, api.NewFitError(task, node, statusSets.Message())
	}

	if statusSets.ContainsUnschedulableAndUnresolvable() \|\| statusSets.ContainsErrorSkipOrWait() {
	return nil, fmt.Errorf("predicates failed in preempt for task <%s/%s> on node <%s>, status is not success or unschedulable",
	task.Namespace, task.Name, node.Name)
	}

Fix the conflict between preemption and antiAffinity #3070

Fix the conflict between preemption and antiAffinity #3070

Conversation

wangyang0616 commented Aug 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lowang-bh commented Aug 26, 2023

lowang-bh commented Aug 26, 2023 • edited Loading

wangyang0616 commented Aug 26, 2023

wangyang0616 commented Aug 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

william-wang left a comment

Choose a reason for hiding this comment

volcano-sh-bot commented Aug 28, 2023

lowang-bh commented Aug 28, 2023

wangyang0616 commented Aug 26, 2023 •

edited

Loading

lowang-bh commented Aug 26, 2023 •

edited

Loading