-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal]Two better resource scheduling and allocation plugins #842
Comments
can we use english in those picture? |
/kind feature |
sure |
MR:#843 |
/cc |
If I understand your requirement clearly, we may enhance the vanilla NodeResourceFit plugins' "requestedToCapacityRatio" scoring strategy like this: diff --git a/test.yaml b/test.yaml
index 5857497..0882911 100644
--- a/test.yaml
+++ b/test.yaml
@@ -1,21 +1,34 @@
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- args:
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: nvidia.com/gpu
weight: 10
requestedToCapacityRatio:
+ - resourceName: cpu # Spread
+ shape:
+ - utilization: 0
+ score: 100
+ - utilization: 100
+ score: 0
+ - resourceName: memory # Spread
+ shape:
+ - utilization: 0
+ score: 100
+ - utilization: 100
+ score: 0
+ - resourceName: nvidia.com/gpu # Bin-packing
shape:
- utilization: 0
score: 0
- utilization: 100
score: 10
type: RequestedToCapacityRatio
name: NodeResourcesFit so that:
FYI: current requestedToCapacityRatio API applies the config to all resources (doc) |
Do you suggest that I use this plug-in and this configuration to solve the problem of using different strategies for different resource types? |
@Huang-Wei In addition, is this feature already supported or does it need to be modified to support it? The final question is which k8s version will it be supported from? |
The example i gave above was a hypothetical one - i.e., not supported in any version of upstream ResourceFit plugin. What I wanted to confirm is if that config would solve all your usecases. If the answer is yes, we can pursue getting implemented in k/k. |
It seems to meet the needs of plugin-one, but it does not seem to meet the needs of plugin-two. |
For the scenario introduced in plugin two, I'm not that convinced using the diff of "types of resources a node exposes" vs. "types of resources a pod requests" is generic enough. For example, let's assume:
According to your description, if a Pod X requests only cpu and memory, Node A and Node B will get the same score? |
Depends on which resource is the scarce resource, if it is GPU, then the NodeB score will be higher |
Is there a design doc that details it? Overall, I'd suggest going with this path:
WDYT? |
Should you optimize the first plug-in, or should you hand it over to me? The second plug-in requires me to re-describe the KEP, right? |
Either way works for me. I can help present the idea in the sig-scheduling bi-weekly meeting first.
Yes. |
Okay, thank you for your proposal for the first plug-in. Also, I am very interested in this implementation. If there is any progress, please synchronize with me. I am willing to contribute code. |
What would you like to be added?
What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.
Why is this needed:
There are related descriptions above
Is there a suggested solution, if so, please add it:
plugin-one
config:
config description:
node score:
plugin-two
config:
config description:
node score:
Why is this needed?
It’s introduced above
The text was updated successfully, but these errors were encountered: