Limit number of parallel starts #8

markdryan · 2016-04-07T09:46:44Z

Currently, there is no limit to the number of instances that ciao-launcher will start in parallel. This places extreme load on the compute node when spawning large amounts of instances at once and can lead to various errors. It may be better for launcher to introduce some sort of semaphore to limit the number of instances that can be launched in parallel to some function of the number of cores on the machine. We could also return a special STATUS, e.g., throttle, to indicate that launcher is overloaded but not full.

When launching large amounts of instances, e.g., 10000, we often see some failures and timeouts in qemu and networking. Reducing the load on the compute node may prevent these failures.

tpepper · 2016-06-22T15:51:53Z

Relates to issue #99 ... Even if a node isn't launching a tonne of things its cpu load could be really high. As could all nodes, temporarily. Scheduler could queue for a bit and meter start commands out to nodes to act as a higher level throttle than just launcher alone. Both issues have merit, but we end up with a few more degrees of freedom in the flow that is START.

This commit limits the number of parallel starts to a function of the number of CPUs present in the node. There really isn't much point in allowing 1000 instances to be started on the same node at the same time. Doing so won't increase the start times much and will increase the likelihood of failure due to the resource exhaustion caused by the heavy demands of instance startup. Fixes ciao-project#8 Signed-off-by: Mark Ryan <[email protected]>

markdryan added enhancement ciao-launcher labels Apr 7, 2016

markdryan self-assigned this Apr 7, 2016

markdryan added the P2 label Jun 6, 2016

amyleeland added this to the Sprint 1 milestone Jun 9, 2016

amyleeland added the ready label Jun 9, 2016

tpepper mentioned this issue Jun 22, 2016

workload launch queue #99

Open

markdryan added the in progress label Jul 6, 2016

markdryan mentioned this issue Jul 6, 2016

[Don't merge] ciao-launcher: Limit the number of parallel starts #339

Closed

amyleeland removed the ready label Jul 6, 2016

tpepper modified the milestones: Sprint 3, Sprint 1 Jul 19, 2016

amyleeland removed the in progress label Jul 29, 2016

kaccardi added a commit to kaccardi/ciao that referenced this issue Aug 11, 2016

test ciao-project#8

092b876

amyleeland removed this from the Sprint 3 milestone Sep 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit number of parallel starts #8

Limit number of parallel starts #8

markdryan commented Apr 7, 2016 •

edited by amyleeland

Loading

tpepper commented Jun 22, 2016

Limit number of parallel starts #8

Limit number of parallel starts #8

Comments

markdryan commented Apr 7, 2016 • edited by amyleeland Loading

tpepper commented Jun 22, 2016

markdryan commented Apr 7, 2016 •

edited by amyleeland

Loading