-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Min:1 Max:1 - still multiple runners spawned #168
Comments
Hello @gc-nathanh, My apologies for the late response, Could you share an idea or example of what the runner pool configuration looks like on your end? And if possible describing workflow scenario with what is happenining and what are you expecting as an outcome. It might help me understand why you are having concurrency issue. Regarding the issue on |
Hi @tcarmet - no problem! A little more information on what we're trying to achieve: we're using the runner manager to stand up instances for our CI in our Openstack cluster, but testing against specific physical hardware (not managed by the runner manager, but connected into the Openstack project), of which we have only one per runner pool. If we have two runners that both register to Github and run against the same physical hardware, one (or both) jobs will fail as the hardware doesn't support concurrency. We've observed that although set to min:1 max:1, there are times where additional runners are created and register. Right now we've reduced the project quota so much that it can only support one runner at a time, but that means we've had to put each runner pool into it's own Openstack project. I've forked the runner manager as i've had to make some specific changes to how VMs are created (we have some specific requirements for additional metadata and particular network interface configuration) but none of this should affect the scheduling logic. I have also optimised our startup time by baking in as much as I can into the runner image so the startup time is as fast as possible. An example of the pool config is: runner_pool:
- config:
flavor: 'amdvcpu.small'
image: 'ubuntu20.04-runner'
availability_zone: ''
rnic_network_name: "dmzvpod4-rnic"
vipu_ipaddr: "10.3.3.189"
partition_name: "dmzvpod4-4ipu"
vipu_port: "8090"
quantity:
min: 1
max: 1
tags:
- Ubuntu20.04
- pod4
- amd
- public
- M2000
- dmzvpod4 where you can see the extra params. I'll see if I can collect some logs which captures the problem. |
Thank you for providing additional context, it's very helpfull! I'm glad you had the idea to pre-install some dependencies inside the runner image to optimize startup time 👌 I can see why you needed to fork the project, I believe there's also some leftover in our code from our own openstack infra. And I confirm, as far as I know, modifications inside the openstack cloud backend shouldn't impact the scheduling. This project was initially built with the logic of pre-creating runners due to how self-hosted runners used to work in the past on GitHub. Before GitHub added the
I think we can agree that this second runner pre-creation should not happen if max is reached, even if a job is running. Are you able to confirm that this seems like the scenario you are facing? PS: Sorry I don't know how I accidently edited your comment instead of answering it. |
I don't have any logs which can say for sure, but I think your supposition is correct. The alternative approach I'd been looking at is to put some logic into the runner registration, but I suspect that would be challenging. |
With the runner in its current state, it might yes. I can recommend you to look at when the runner receive a webhook of a runner that started a job ( Worth noting that we do have plans to rework this part in Q2 and make it easier to test and maintain, however we're still in planning I can't make any promise into weither we will be working on it or not. I can update you here if that ends up to be the case. |
Better late than never, thanks to #685 and the 1.x of the runner-manager this issue is now solved. |
We have been testing runner-manager for our CI workloads, but for various reasons we cannot support any sort of concurrency for some of our runner configurations - there can be only a single runner available.
Despite setting min1: max:1 we still sometimes have the scenario where multiple runners are spawned.
Is this a known issue?
The text was updated successfully, but these errors were encountered: