-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler-simulate-proposal #3822
base: master
Are you sure you want to change the base?
scheduler-simulate-proposal #3822
Conversation
Signed-off-by: molei20021 <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: molei20021 <[email protected]>
@@ -0,0 +1,32 @@ | |||
# Volcano scheduler simulate | |||
## background | |||
* Consider such situation: users changed the parameter of nodeorder plugin and need to know the effect to the production enviroment. For example, after change the mostrequested.weight, if the average wait time of big task is shorter than before, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides features validation, we also need scenarios for node simulation. (e.g., we don’t have GPU and NPU nodes, if there is some bugs in GPU or NPU scheduling, we can use simulation scheduling to debug. We do meet this scenario in our production). And to test the performance of the scheduler in large-scale clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add more node features of gpu in nodes.csv like gpu_allocatable, etc
### time simulator | ||
* Time simulator is helpful to shorten simulate time because it will not get the time of real world, it will always get next timestamp of the min value between the create time of next pod and the finish time of next pod. | ||
* The time related parameter should get from time simulator like pod create time, pod finish time, current time, etc... | ||
### kube-apiserver simulator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about integrating with kwok? kwok can simulate thousands of nodes and doesn't consume many resources. I'm wondering that if we use kwok, then we don't need to simulate kube-apiserver and kube-controller-manager. I worry about if you need to simulate kube-apiserver and kcm, there is lots of work you need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project kube-scheduler-simulator also follow kwok to simulate scheduling and do performance testing, can we also do this way? https://github.com/kubernetes-sigs/kube-scheduler-simulator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is how to speed up the simulation time, for example, a pod may run 20 hours and we also let it run 20 hours in kwok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact that kwok doesn't have kubelet, you can directly set when will the pod ends and at what stage, check this: https://kwok.sigs.k8s.io/docs/user/stages-configuration/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I create a PR: #3830 to add two scripts: one is for installing kwok and the other is for creating fake nodes. Perhaps building upon kwok is a better way to do simulating, you can add your stage confugration to do time simulating or other useful simulating. I think it's better and help us to do less work, kwok has been adopted by many schedulers as a tool for simulating scheduling.
No description provided.