This repository has been archived by the owner on Jul 16, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 51
Limit number of parallel starts #8
Labels
Comments
Relates to issue #99 ... Even if a node isn't launching a tonne of things its cpu load could be really high. As could all nodes, temporarily. Scheduler could queue for a bit and meter start commands out to nodes to act as a higher level throttle than just launcher alone. Both issues have merit, but we end up with a few more degrees of freedom in the flow that is START. |
markdryan
pushed a commit
to markdryan/ciao
that referenced
this issue
Jul 6, 2016
This commit limits the number of parallel starts to a function of the number of CPUs present in the node. There really isn't much point in allowing 1000 instances to be started on the same node at the same time. Doing so won't increase the start times much and will increase the likelihood of failure due to the resource exhaustion caused by the heavy demands of instance startup. Fixes ciao-project#8 Signed-off-by: Mark Ryan <[email protected]>
markdryan
pushed a commit
to markdryan/ciao
that referenced
this issue
Jul 7, 2016
This commit limits the number of parallel starts to a function of the number of CPUs present in the node. There really isn't much point in allowing 1000 instances to be started on the same node at the same time. Doing so won't increase the start times much and will increase the likelihood of failure due to the resource exhaustion caused by the heavy demands of instance startup. Fixes ciao-project#8 Signed-off-by: Mark Ryan <[email protected]>
kaccardi
added a commit
to kaccardi/ciao
that referenced
this issue
Aug 11, 2016
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently, there is no limit to the number of instances that ciao-launcher will start in parallel. This places extreme load on the compute node when spawning large amounts of instances at once and can lead to various errors. It may be better for launcher to introduce some sort of semaphore to limit the number of instances that can be launched in parallel to some function of the number of cores on the machine. We could also return a special STATUS, e.g., throttle, to indicate that launcher is overloaded but not full.
When launching large amounts of instances, e.g., 10000, we often see some failures and timeouts in qemu and networking. Reducing the load on the compute node may prevent these failures.
The text was updated successfully, but these errors were encountered: