-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable job parallelism #29
Conversation
@@ -297,7 +298,9 @@ Runner.prototype = { | |||
// Keep around N most recent build directories. | |||
// Default is 0, ie wipe at start of each run. | |||
// Later, this can be configurable in the UI. | |||
keeper({baseDir: path.join(this.config.dataDir, "data"), count: 0}, function(err) { | |||
keeper({baseDir: path.join(this.config.dataDir, "data"), count: this.config.concurrentJobs - 1}, function(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized after I put this together that this is too simplistic; it assumes that, at most, the most recent n-1 data directories (possibly) belong to concurrent builds. If you have one long build and many short ones that happen before it completes, you can have many more data directories still in use.
I'll have to introduce a better way to track which data directories are in use and which are not.
Good start |
Good to see some work on this. The data dirs could be keyed by job ID or similar. Reaping them becomes a little tricky. Some folks may like to re-use build directories in some circumstances too (even though that's not always going to be safe). Just something to think about. |
I feel like concurrent jobs for the same project in the same branch don't make sense at the moment. Maybe that should be the limitation. |
Hmm! That's going to be tricky to manage - an async.queue won't cut it any more, because more commits can arrive on a branch corresponding to an existing job even while the queue is unsaturated. Concurrent builds on a single branch will be an issue if they trigger deployments, though.
I had been thinking about maintaining an in-memory structure of data directories that are in-use and excluding them from the culling somehow. I think the runner can get away with something in memory because simple-runner jobs don't persist across service restarts anyway - otherwise I'd need to either parse the job ID from the data directory name or use some kind of flag file to identify active data directories.
Re-using a data directory from one job to another? I'd thought that was what the cache was intended for. With the job ID in the data directory name, how does that work - or am I misunderstanding? |
So there are two concepts, the cache and the data dir (as far as I know) and the cache is used by certain plugins but the data dir is where the project is cloned into and built. Personally I always copy the contents from the data dir to |
5032921
to
61ca92a
Compare
This should only happen in unit tests.
Okay, I've revisited this to:
Here's what the data directory structure looks like:
I've also tried to be careful in extracting branch names from the job in a sane way because Job's ref field can hold arbitrary things. I'm testing for a few common keys ( |
Do you think this is ready for merge? Did you have any plans to write some basic tests for this |
Nvm, you have tests. |
I was just giving it a last look-over to see if there was anything obvious I'd missed. I don't think so 😄 |
So it's probably a good idea to expose the configuration in Strider UI. |
Yeah, agreed. The number of recent directories to keep should be simple enough to add. The level of concurrency is less clear to me. Is there a place for installation-wide (rather than build-specific) configuration for runner plugins? |
Hum, no but Strider-CD/strider#681 would help when I get more time to finish it. |
How is it configured now, an env var? Need to document in the mean time. |
Yeah, |
It's a little generic, but I wasn't sure how far to go down the |
No, this is good. I documented it in the strider repo and updated the version of simple-runner. |
I'm introducing a runner configuration option,
concurrentJobs
, to allow multiple jobs to be scheduled in parallel.