Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to run multiple workers without assigning jobs based on id partitioning #19

Open
jainpawan opened this issue Mar 7, 2016 · 2 comments

Comments

@jainpawan
Copy link
Contributor

Hi, The current implementation which (worker) - outof (total workers) picks jobs by partitioning the id space. If for whatever reason one of the worker is stuck processing a job, all the current and future jobs assigned to this worker will stall. Is there any plan (or ideas) to make this more fault tolerant? i.e. a worker can pick any job that is scheduled to run now (irrespective of the id of the job)? This will also make adding and removing worker much easier.

@KayEss
Copy link
Member

KayEss commented Mar 8, 2016

This is indeed a weakness of the current implementation. We do have some solutions in mind, but it will likely complicate the infrastructure needed to execute the queues.

There's a few things we do about this that helps. Keep jobs small -- if there is a lot of processing then use a group to manage the jobs. Progress can be tracked across the group. Jobs that are failing should just throw an exception and let the worker retry it later, i.e. don't have the job itself do any retries.

If you are using multiple workers it's really important that you think through the implications of different isolation levels -- I'd strongly recommend using SERIALIZABLE isolation. If your system won't run cleanly with that turned on then it may imply that you're actually getting some data corruption at lower isolation levels.

What we're looking at in the longer term is to have a process that makes use of the Posgres LISTEN/NOTIFY system to see new jobs and changes to jobs and then use that process to launch individual jobs, or batches of jobs. We run many microservices so this would allow us to reduce latency, increase parallel execution of jobs and do it with less workers overall.

@KayEss
Copy link
Member

KayEss commented Jun 3, 2017

For other projects we've been developing a tool that would allow this, the wright-exec-helper. It multiplexes jobs using a fairly simple mechanism through printing and reading to/from stdout/stdin. The downside to it, from this project's perspective at least, is that it is native code.

The protocol is pretty simple though and it should be possible to implement something that performs the same function (albeit a bit more slowly) in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants