[proposal]Two better resource scheduling and allocation plugins #842

LY-today · 2024-12-23T02:56:50Z

What would you like to be added?

What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.

Why is this needed:
There are related descriptions above

Is there a suggested solution, if so, please add it:

plugin-one

config：

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

config description：

node score:

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

plugin-two

config：

resources: 
- nvidia.com/gpu

config description：

node score:

finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum

Why is this needed?

It’s introduced above

The text was updated successfully, but these errors were encountered:

googs1025 · 2024-12-23T04:45:22Z

can we use english in those picture?

googs1025 · 2024-12-23T04:45:31Z

/kind feature

LY-today · 2024-12-23T06:11:21Z

can we use english in those picture?

sure

LY-today · 2024-12-23T06:35:28Z

MR：#843

ffromani · 2025-01-10T12:58:28Z

/cc

Huang-Wei · 2025-01-11T23:17:11Z

If I understand your requirement clearly, we may enhance the vanilla NodeResourceFit plugins' "requestedToCapacityRatio" scoring strategy like this:

diff --git a/test.yaml b/test.yaml
index 5857497..0882911 100644
--- a/test.yaml
+++ b/test.yaml
@@ -1,21 +1,34 @@
 apiVersion: kubescheduler.config.k8s.io/v1
 kind: KubeSchedulerConfiguration
 profiles:
 - pluginConfig:
   - args:
       scoringStrategy:
         resources:
         - name: cpu
           weight: 1
         - name: memory
           weight: 1
         - name: nvidia.com/gpu
           weight: 10
         requestedToCapacityRatio:
+        - resourceName: cpu # Spread
+          shape:
+          - utilization: 0
+            score: 100
+          - utilization: 100
+            score: 0
+        - resourceName: memory # Spread
+          shape:
+          - utilization: 0
+            score: 100
+          - utilization: 100
+            score: 0
+        - resourceName: nvidia.com/gpu # Bin-packing
           shape:
           - utilization: 0
             score: 0
           - utilization: 100
             score: 10
         type: RequestedToCapacityRatio
     name: NodeResourcesFit

so that:

each resource can be scored differently (in above example, GPU to be most allocated, cpu/memory to be least allocated
weight of each resource can define the priority among resources

FYI: current requestedToCapacityRatio API applies the config to all resources (doc)

LY-today · 2025-01-12T03:16:00Z

If I understand your requirement clearly, we may enhance the vanilla NodeResourceFit plugins' "requestedToCapacityRatio" scoring strategy like this:

diff --git a/test.yaml b/test.yaml
index 5857497..0882911 100644
--- a/test.yaml
+++ b/test.yaml
@@ -1,21 +1,34 @@
 apiVersion: kubescheduler.config.k8s.io/v1
 kind: KubeSchedulerConfiguration
 profiles:
 - pluginConfig:
   - args:
       scoringStrategy:
         resources:
         - name: cpu
           weight: 1
         - name: memory
           weight: 1
         - name: nvidia.com/gpu
           weight: 10
         requestedToCapacityRatio:
+        - resourceName: cpu # Spread
+          shape:
+          - utilization: 0
+            score: 100
+          - utilization: 100
+            score: 0
+        - resourceName: memory # Spread
+          shape:
+          - utilization: 0
+            score: 100
+          - utilization: 100
+            score: 0
+        - resourceName: nvidia.com/gpu # Bin-packing
           shape:
           - utilization: 0
             score: 0
           - utilization: 100
             score: 10
         type: RequestedToCapacityRatio
     name: NodeResourcesFit

so that:

each resource can be scored differently (in above example, GPU to be most allocated, cpu/memory to be least allocated
weight of each resource can define the priority among resources

FYI: current requestedToCapacityRatio API applies the config to all resources (doc)

Do you suggest that I use this plug-in and this configuration to solve the problem of using different strategies for different resource types?

LY-today · 2025-01-12T03:25:34Z

If I understand your requirement clearly, we may enhance the vanilla NodeResourceFit plugins' "requestedToCapacityRatio" scoring strategy like this:

diff --git a/test.yaml b/test.yaml
index 5857497..0882911 100644
--- a/test.yaml
+++ b/test.yaml
@@ -1,21 +1,34 @@
 apiVersion: kubescheduler.config.k8s.io/v1
 kind: KubeSchedulerConfiguration
 profiles:
 - pluginConfig:
   - args:
       scoringStrategy:
         resources:
         - name: cpu
           weight: 1
         - name: memory
           weight: 1
         - name: nvidia.com/gpu
           weight: 10
         requestedToCapacityRatio:
+        - resourceName: nvidia.com/gpu # Spread
+          shape:
+          - utilization: 0
+            score: 100
+          - utilization: 100
+            score: 0
+        - resourceName: nvidia.com/gpu # Spread
+          shape:
+          - utilization: 0
+            score: 100
+          - utilization: 100
+            score: 0
+        - resourceName: nvidia.com/gpu # Bin-packing
           shape:
           - utilization: 0
             score: 0
           - utilization: 100
             score: 10
         type: RequestedToCapacityRatio
     name: NodeResourcesFit

so that:

each resource can be scored differently (in above example, GPU to be most allocated, cpu/memory to be least allocated
weight of each resource can define the priority among resources

FYI: current requestedToCapacityRatio API applies the config to all resources (doc)

Do you suggest that I use this plug-in and this configuration to solve the problem of using different strategies for different resource types?

@Huang-Wei In addition, is this feature already supported or does it need to be modified to support it? The final question is which k8s version will it be supported from?

Huang-Wei · 2025-01-12T04:27:45Z

The example i gave above was a hypothetical one - i.e., not supported in any version of upstream ResourceFit plugin.

What I wanted to confirm is if that config would solve all your usecases. If the answer is yes, we can pursue getting implemented in k/k.

LY-today · 2025-01-12T04:33:18Z

The example i gave above was a hypothetical one - i.e., not supported in any version of upstream ResourceFit plugin.

What I wanted to confirm is if that config would solve all your usecases. If the answer is yes, we can pursue getting implemented in k/k.

It seems to meet the needs of plugin-one, but it does not seem to meet the needs of plugin-two.

Huang-Wei · 2025-01-12T05:27:40Z

For the scenario introduced in plugin two, I'm not that convinced using the diff of "types of resources a node exposes" vs. "types of resources a pod requests" is generic enough. For example, let's assume:

Node A exposes {cpu, memory, ephemeral-storage, GPU}
Node B exposes {cpu, memory, ephemeral-storage, Bar}

According to your description, if a Pod X requests only cpu and memory, Node A and Node B will get the same score?

LY-today · 2025-01-13T02:25:50Z

For the scenario introduced in plugin two, I'm not that convinced using the diff of "types of resources a node exposes" vs. "types of resources a pod requests" is generic enough. For example, let's assume:

Node A exposes {cpu, memory, ephemeral-storage, GPU}

Node B exposes {cpu, memory, ephemeral-storage, Bar}

According to your description, if a Pod X requests only cpu and memory, Node A and Node B will get the same score?

Depends on which resource is the scarce resource, if it is GPU, then the NodeB score will be higher

Huang-Wei · 2025-01-13T05:37:54Z

Depends on which resource is the scarce resource, if it is GPU, then the NodeB score will be higher

Is there a design doc that details it?

Overall, I'd suggest going with this path:

for plugin one, I'd like to raise a KEP in upstream and get it implemented there (k/k) to benefit a much wider audience
for plugin two, rewrite feat: add kep md #845 and evolve the impl. in this repo (k-sigs/scheduler-plugins).

WDYT?

LY-today · 2025-01-13T07:30:06Z

Depends on which resource is the scarce resource, if it is GPU, then the NodeB score will be higher

Is there a design doc that details it?

Overall, I'd suggest going with this path:

for plugin one, I'd like to raise a KEP in upstream and get it implemented there (k/k) to benefit a much wider audience

for plugin two, rewrite feat: add kep md #845 and evolve the impl. in this repo (k-sigs/scheduler-plugins).

WDYT?

Should you optimize the first plug-in, or should you hand it over to me? The second plug-in requires me to re-describe the KEP, right?

Huang-Wei · 2025-01-13T18:10:06Z

Should you optimize the first plug-in, or should you hand it over to me?

Either way works for me. I can help present the idea in the sig-scheduling bi-weekly meeting first.

The second plug-in requires me to re-describe the KEP, right?

Yes.

LY-today · 2025-01-14T05:01:16Z

Should you optimize the first plug-in, or should you hand it over to me?

Either way works for me. I can help present the idea in the sig-scheduling bi-weekly meeting first.

The second plug-in requires me to re-describe the KEP, right?

Yes.

Okay, thank you for your proposal for the first plug-in. Also, I am very interested in this implementation. If there is any progress, please synchronize with me. I am willing to contribute code.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 23, 2024

googs1025 mentioned this issue Dec 24, 2024

[proposal]two better resource scheduling and allocation plugins kubernetes/kubernetes#129316

Closed

Huang-Wei mentioned this issue Jan 11, 2025

feat: add NodeResourcesFitPlus and ScarceResourceAvoidance plugin #843

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal]Two better resource scheduling and allocation plugins #842

[proposal]Two better resource scheduling and allocation plugins #842

LY-today commented Dec 23, 2024 •

edited

Loading

googs1025 commented Dec 23, 2024

googs1025 commented Dec 23, 2024

LY-today commented Dec 23, 2024

LY-today commented Dec 23, 2024

ffromani commented Jan 10, 2025

Huang-Wei commented Jan 11, 2025 •

edited

Loading

LY-today commented Jan 12, 2025 •

edited by Huang-Wei

Loading

LY-today commented Jan 12, 2025

Huang-Wei commented Jan 12, 2025 •

edited

Loading

LY-today commented Jan 12, 2025

Huang-Wei commented Jan 12, 2025

LY-today commented Jan 13, 2025

Huang-Wei commented Jan 13, 2025

LY-today commented Jan 13, 2025 •

edited

Loading

Huang-Wei commented Jan 13, 2025

LY-today commented Jan 14, 2025

[proposal]Two better resource scheduling and allocation plugins #842

[proposal]Two better resource scheduling and allocation plugins #842

Comments

LY-today commented Dec 23, 2024 • edited Loading

What would you like to be added?

plugin-one

plugin-two

Why is this needed?

googs1025 commented Dec 23, 2024

googs1025 commented Dec 23, 2024

LY-today commented Dec 23, 2024

LY-today commented Dec 23, 2024

ffromani commented Jan 10, 2025

Huang-Wei commented Jan 11, 2025 • edited Loading

LY-today commented Jan 12, 2025 • edited by Huang-Wei Loading

LY-today commented Jan 12, 2025

Huang-Wei commented Jan 12, 2025 • edited Loading

LY-today commented Jan 12, 2025

Huang-Wei commented Jan 12, 2025

LY-today commented Jan 13, 2025

Huang-Wei commented Jan 13, 2025

LY-today commented Jan 13, 2025 • edited Loading

Huang-Wei commented Jan 13, 2025

LY-today commented Jan 14, 2025

LY-today commented Dec 23, 2024 •

edited

Loading

Huang-Wei commented Jan 11, 2025 •

edited

Loading

LY-today commented Jan 12, 2025 •

edited by Huang-Wei

Loading

Huang-Wei commented Jan 12, 2025 •

edited

Loading

LY-today commented Jan 13, 2025 •

edited

Loading