Proposal: Preempt action support topology #3995

bibibox · 2025-02-05T02:38:06Z

What type of PR is this?

/kind documentation

Signed-off-by: Box Zhang <[email protected]>

volcano-sh-bot · 2025-02-05T02:38:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign william-wang
You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wangyang0616 · 2025-02-05T03:18:27Z

docs/design/preempt-action-support-topology.md

+
+When topology-sensitive resources like GPUs exist, the preemption process needs to consider resource topology relationships to ensure resource allocation after preemption still satisfies original topology constraints.
+
+For example, if a node has 2 GPUs (8GB each), Pod A and Pod B each use 4GB, and Pod C needs 8GB. Direct scheduling of Pod C will fail, triggering preemption. After removing Pod A, Pod C can be scheduled, but when re-adding Pod A, topology changes might occur due to binpack strategy. At this point, Pod C can still be scheduled, ultimately leading to preemption failure due to no pods being evicted.


Is the example here about the current situation of volcano preemption or the challenges of the current optimization solution?

wangyang0616 · 2025-02-05T03:45:38Z

docs/design/preempt-action-support-topology.md

+  type SimulateAddPodFn func(pod *api.TaskInfo, node *api.NodeInfo) error
+  ```
+
+### Limitations


The native Kubernetes scheduler has some capability constraints in terms of preemption, and has made certain trade-offs in terms of functionality and performance. See: # limitations-of-preemption
Compared with kube-scheduler, are the functional manifestations of Volcano's affinity preemption consistent or different? If so, what are the detailed differences?

Monokaix · 2025-02-08T07:46:39Z

There is chinese character in the img, and the subject of each process needs to be clearly identified.

Monokaix · 2025-02-08T07:48:19Z

The process is hard to understand for common users, we'd better make it more clearer, maybe we can add both desgin process and an example.

Monokaix · 2025-02-08T07:49:34Z

The three Key Functions are not presented in the above process design, we can give a more detailed description.

Monokaix · 2025-02-08T08:00:37Z

What's the standard of PreemptCostNodeOrder, least evicted pod mums? We should give one.

Monokaix · 2025-02-08T10:11:53Z

In what case a plugin should register removal and addition func, we should give a guide.

Proposal: Preempt action support topology

2d91457

Signed-off-by: Box Zhang <[email protected]>

volcano-sh-bot added retest-not-required-docs-only kind/documentation Categorizes issue or PR as related to documentation. labels Feb 5, 2025

volcano-sh-bot requested review from k82cn and shinytang6 February 5, 2025 02:38

volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 5, 2025

wangyang0616 reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Preempt action support topology #3995

Proposal: Preempt action support topology #3995

bibibox commented Feb 5, 2025

volcano-sh-bot commented Feb 5, 2025

wangyang0616 Feb 5, 2025

wangyang0616 Feb 5, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025


		When topology-sensitive resources like GPUs exist, the preemption process needs to consider resource topology relationships to ensure resource allocation after preemption still satisfies original topology constraints.

		For example, if a node has 2 GPUs (8GB each), Pod A and Pod B each use 4GB, and Pod C needs 8GB. Direct scheduling of Pod C will fail, triggering preemption. After removing Pod A, Pod C can be scheduled, but when re-adding Pod A, topology changes might occur due to binpack strategy. At this point, Pod C can still be scheduled, ultimately leading to preemption failure due to no pods being evicted.

Proposal: Preempt action support topology #3995

Are you sure you want to change the base?

Proposal: Preempt action support topology #3995

Conversation

bibibox commented Feb 5, 2025

What type of PR is this?

volcano-sh-bot commented Feb 5, 2025

wangyang0616 Feb 5, 2025

Choose a reason for hiding this comment

wangyang0616 Feb 5, 2025

Choose a reason for hiding this comment

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025

Monokaix commented Feb 8, 2025