Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why poll when using LISTEN/NOTIFY? #90

Closed
sj26 opened this issue Aug 24, 2020 · 37 comments
Closed

Why poll when using LISTEN/NOTIFY? #90

sj26 opened this issue Aug 24, 2020 · 37 comments
Labels
question Further information is requested

Comments

@sj26
Copy link
Contributor

sj26 commented Aug 24, 2020

Absolutely loving good_job β€”Β good job! πŸ˜…

I'm curious why it polls, though, when it supports LISTEN/NOTIFY?

@bensheldon
Copy link
Owner

Thanks for the kind words πŸ™Œ

LISTEN/NOTIFY works for jobs scheduled immediately (ExampleJob.perform_later), but isn't used for future-scheduled jobs (ExampleJob.set(wait: 1.day).perform_later or retry with exponential backoff ). Polling is necessary to pick up those latter jobs. LISTEN/NOTIFY allows increasing the poll interval (On my applications I set it to 10 seconds), but some applications may want greater scheduling accuracy so the default is still 1 second. Though I'm not committed to any of these constants.

Just while I'm thinking of it. Currently the polling implementation is integrated with the Scheduler object, whereas it could also be integrated with the Notifier object. A process may have multiple schedulers but only a single notifier. For some configurations having the Notifier implement polling could be more efficient, but I'm not sure.

@bensheldon bensheldon added the question Further information is requested label Aug 24, 2020
@slimdave
Copy link

If the GoodJob process wasn't running when a job was enqueued, would the polling be responsible for picking that job up when the GoodJob process starts also?

@bensheldon
Copy link
Owner

If the GoodJob process wasn't running when a job was enqueued, would the polling be responsible for picking that job up when the GoodJob process starts also?

Yes πŸ‘

Polling is necessary for proper functioning, and LISTEN/NOTIFY is nice-to-have.

Though if you did want to disable polling, that's possible with GOOD_JOB_POLL_INTERVAL=-1. Though that would lead to jobs not being executed for all the previously stated reasons.

@slimdave
Copy link

Ah hah, clever stuff. If you had multiple processes monitoring the queue, then they would all get the NOTIFY but only one would be able to grab the advisory lock and the rest would continue listening and polling?

@sj26
Copy link
Contributor Author

sj26 commented Aug 24, 2020

Interesting!

What I'd imagine is that enqueuing new jobs, immediate or scheduled, would post a simple "changed" message via NOTIFY. Then there would be a supervisor process which LISTENs, see the "changed" message, and checks to see if there's new immediate work β€”Β and kick off workers β€”Β or if the next scheduled time has changed, and then go back to sleep until the next scheduled job or LISTEN message.

@bensheldon
Copy link
Owner

Having the Scheduler itself track when the next scheduled job is expected would be interesting. If you want to dive into the code you could make a proof of concept.

And additional execution accuracy does introduce a (potential) problem of this, which is correct:

[Schedulers] would all get the NOTIFY but only one would be able to grab the advisory lock and the rest would continue listening and polling

...meaning that any available Scheduler objects will all query the database at once trying to pick up that next job. Polling does (potentially) introduce a bit more randomness. But the challenge here is that different applications will have different kinds of workloads.

@sj26
Copy link
Contributor Author

sj26 commented Aug 25, 2020

I'll have a go. πŸ‘

Avoiding a thundering herd could be achieved by adding jitter, within the poll interval?

@slimdave
Copy link

... any available Scheduler objects will all query the database at once trying to pick up that next job.

This is the situation (and problem) that we have. In some circumstances we have a lot of small Heroku DelayedJob worker processes monitoring the queue (50+ or them) in order to parallelise the generation of 100's of thousands of fragments of XML. Each one takes a fraction of a second to generate, so scaling to 50 dynos lets us process them all in say 4 hours instead of many days.

Polling does not work well in this situation, and nor does locking records by updating them!

So moving to a system of slow polling and relying more on listen/notify would be great.

Does it sound like GoodJob is a candidate for that?

(Sorry, aware that this has gone a little off-topic).

@bensheldon
Copy link
Owner

@slimdave tough to say, tbh. Scaled/production replacements are a lot more difficult than growing with a particular solution πŸ˜„

It does though make me think that it would be helpful to unpack a couple of different scenarios, which went into my thinking for the GoodJob Scheduler (a scheduler has a single threadpool; a process may have multiple schedulers; there are potentially multiple processes running).

  • "Cold Start": No jobs are actively being worked by any thread, and then job(s) are added to the queue, and an execution thread is created. This is a situation where LISTEN/NOTIFY (push) or poll directly effects latency.
  • "Running hot": Execution thread(s) are actively working jobs and when one completes execution of a job, the execution thread immediately queries for the next job.
  • "Scaling Up": The mechanism by which additional execution threads are created/woken-up up to the max configuration. Currently this happens on both push or poll, when 1 additional thread is attempted to be created for each push/poll event.
  • "Fully scaled": The maximum number of threads are all engaged executing jobs. In this case, poll/push has no effect; the scheduler ignores them.
  • "Scaling down": When an execution thread queries for the next job, and there is none, the thread will go back to sleep. When all threads are sleeping we're back to the "cold start" scenario.

Depending on the application/workload each of these states may matter more or less.

@jrochkind
Copy link
Contributor

My question is the opposite -- why use LISTEN/NOTIFY, making more complex code and tying to postgres, if you have to poll anyway? Especially if you are polling every second by default -- I don't think any bg job system expects guaranteed latency of less than a second in a job getting picked up, it's a bg job system, some delay is expected.

Would it be simpler to only poll if you need to poll anyway?

Or maybe an option to disable listen/notify to use on non-postgres systems? Although that would still leave the complexity of the listen/notify code in the codebase to maintain. Well either way, that's for the future anyway, get it stable and mature on postgres first.

@bensheldon
Copy link
Owner

@jrochkind those are great questions. GoodJob also uses Advisory Locks which are Postgres only too.

I agree with you broadly. It's possible to make GoodJob be database agnostic, but right now my focus is on serving the needs of Postgres users and making GoodJob compelling, which is something I'm learning more about every day.

@bensheldon
Copy link
Owner

Having the Scheduler itself track when the next scheduled job is expected would be interesting.

This is released in v1.7.0 (#205). By default GoodJob caches 10k future-scheduled jobs (uses about 20MB of memory), though it's configurable. I still would recommend polling as a belt-and-suspenders in case the cache is exhausted, but possible to dial back the polling while maintaining low latency due to the cache.

@vtt
Copy link

vtt commented Mar 17, 2021

Polling is not necessary when you use LISTEN/NOTIFY; it would provide you all the schedules of the jobs; so there is no need to poll. I've created skiplock to demonstrate this point. Note that you can gain quite a bit of performance if you don't need to poll.

@bensheldon
Copy link
Owner

@vtt That's great! Thanks for sharing skiplock. I'm curious how you addressed some design challenges I've run into when thinking about how to entirely remove polling:

  • For enqueued, future-scheduled jobs:
    • Memory management. E.g. if there is 10M future scheduled jobs on the queue, each worker will need to store that schedule data in memory and that could become a problem if unmanaged. GoodJob currently sets a default 10k limit of the number of job schedules it will store in memory, and if that is exhausted, there needs to be a mechanism to refill it.
    • Worker startup. The worker will need to fetch from the database all of the future-scheduled jobs and load them into memory. GoodJob currently does this, but that's potentially a lot of data over the wire.
  • Thundering herd of multiple processes all trying to fetch-and-lock a job at the same time. Seems possible to solve with jitter, but curious if you have addressed it.

I'm curious about the performance benefits you're focused on too. Thanks again for hopping in here.

@vtt
Copy link

vtt commented Mar 17, 2021

  • the 10M future scheduled jobs will create 10M database job records; PG will notify each record including the scheduled time; the job queue will listen and received all 10M notifications; but it only cares about the next earliest scheduled time.... once this scheduled_time is reached, then job is dispatched.
  • similarly at startup, it would first execute all the jobs with scheduled time met (or no schedule time); then eventually it runs out and comes up to the next earliest scheduled time... only need to monitor this time; once the time comes, dispatch the job, and cycle repeats
  • there is no thundering herd issues... at startup, each thread only sends one SQL request to check if there's any available jobs... if you have 5 workers, then that's only 5 threads at startup sending 5 SQL queries (note that new threads are only created if there are jobs to be executed). The first 5 queries from the 5 initial threads would determine that there's no job and return the next scheduled_at time.
  • there would be no locking issues because of using SKIP LOCKED; the database would lock the first executable job and the other threads won't be able to see them. There won't be any waiting time either; the locked job would be just skipped over from the query list.

Have a look at Skiplock::Job.dispatch function; it's quite short and might be easier to understand

@vtt
Copy link

vtt commented Mar 17, 2021

@bensheldon, On my machine using, your queue-shootout fork, good_job is able to hit 9k jobs/sec but skiplock is steady at 26k jobs/sec - almost 3x fold. I've sent a pull request for queue-shootout and you can try to run it on your machine to compare. The method used to benchmark in queue-shootout is measuring "burst" speed only; so it's not really reflecting the production benchmark.

@jrochkind
Copy link
Contributor

Realistically, how many have a job load that can be executed at more than 9K jobs/sec anyway?

But skiplock looks awesome! It already supports forking multiple worker processes out of the box?? Is skiplock just a demo proof of concept at this point, not really ready for prime time? Very curious for the contents of currently empty "Retry" and "Notifications" docs systems. (Also it doesn't support multiple queues at present, i guess?)

@vtt
Copy link

vtt commented Mar 17, 2021

I just published the skiplock gem last night and have not caught up on the documentations; I will document the rest of the features in the next few days. The Skiplock Retry system is fully working; I haven't commit the Notifications feature yet and that will be done soon as well.

Multiple workers are supported ; as for queues, I'm still debating the design; it will be implemented once it's finalized.

FYI, I created skiplock just 4 days ago; it is quite small but it packs a powerful punch

@bensheldon
Copy link
Owner

@vtt thanks! I'll review the queue-shootout PR. And also try to get back in the context of why I was sweating those design challenges.

One behavioral difference I noticed is that skiplock wraps job execution in a transaction. Avoiding that was a goal of GoodJob (I want to support long-running jobs), which is why GoodJob uses session-level advisory locks. I'm not surprised that skiplock's SKIP LOCKED is faster than GoodJob's CTE-based advisory-locking query. There are other strategies like Sidekiq's super fetch to avoid a wrapping transaction, but with increased complexity.

@vtt
Copy link

vtt commented Mar 17, 2021

@bensheldon, why would you want to avoid the transaction? The downside is that each thread uses a dedicated database connection during dispatching; but the upside is guaranteed reliability and performance; if you have 5 workers each with 5 threads; then it will maximum use 25 connections in the pool; these connections cannot be used by Rails application; but in return, you get the safety and reliabily provided by PostgreSQL

Note that once the job dispatches are completed, the used connections are checked back into pool.

@vtt
Copy link

vtt commented Mar 17, 2021

From your comment, I've just caught a design flaw in skiplock when a job is taking too long and the application loses connection to the database. I've committed the fix which requires a change in database migration; this, in turn, changes the transaction block to be almost instant by marking the job to be in progress then commit.

@bensheldon
Copy link
Owner

@vtt In my experience, database connection limits can become a bottleneck. For example, Heroku Hobby databases only allow 20 connections. I've also seen a lot of interest in PgBouncer for GoodJob (#52) so I think database connection management is on people's minds when choosing a backend... and PgBouncer compatibility is definitely on my own mind when thinking about how to evolve GoodJob.

@vtt
Copy link

vtt commented Mar 17, 2021

I would assume that if the server limits the database connections then it would limit the CPU/memory resources as well; so for this case, you would use 1 additional worker at most for background jobs (using only 5 connections) or better off using async mode by setting workers = 0

@bensheldon
Copy link
Owner

Every workload is different. I hit a lot of (slow) 3rd party APIs in one project and it's possible to stack up quite a bit of concurrency that's not CPU/Memory limited.

I don't think there is a perfect implementation tbh, just different tradeoffs and targeted use-cases.

@jrochkind
Copy link
Contributor

jrochkind commented Mar 17, 2021

Typical setup in Rails is going to hold the DB connection until the job is done anyway though.

ActiveRecord connection handling is weird and messy. But if the domain-specific job code does any AR activities, that will check out a connection which will ordinarily remain checked out until end of the job

In fact it's up the job wrapper to make sure it gets checked back in at all -- generally with a Rails executor/reloader. Sidekiq uses the Rails reloader.

I'm not sure what good_job does -- if you are doing nothing, I think you are going to be leaking connections from any job that uses AR, which is not good. (correction maybe not, maybe ActiveJob takes care of it for you as in discussion at #97)

I understand it's still worthwhile to not add to the problem -- after all, not every job uses AR at all, and there are ways to try to work around it with AR if you really want to.

But for reasons of AR design, it's not easy to avoid holding onto a connection for duration of job execution if a job is using AR.

@bensheldon
Copy link
Owner

@jrochkind you're right that GoodJob currently does implicitly hold onto the AR connection... but my brain is already on to #52 and solving for PgBouncer and using shared db-connections. GoodJob shares the same AR connection with the job, so 1-thread only uses 1 connection, whereas if I'm understanding it, skiplock uses potentially 2 (one holding a transaction for the job record, potentially a second for AR).

@bensheldon
Copy link
Owner

Apologies if this thread becomes now a skiplock one, but this is following along some of the things on my mind with GoodJob.

@vtt I noticed this change vtt/skiplock@73a1d0e you introduced a running [boolean] column. This introduces a problem of jobs potentially being orphaned if the process is SIGKILLed. In #222 there's a link to a Sidekiq issue that describes their strategy for cleaning up jobs locked by dead processes.

I'm gonna step away from this Issue, but happy to see you working in this space!

@vtt
Copy link

vtt commented Mar 17, 2021

@bensheldon , this was previously the case; but the recent change no longer requires this restriction; now the connection can be released back into the pool to be used by AR because the TRANSACTION lock can be released after marking the Job in progress.

However, this would incur performance cost if the application handles a lot of background jobs. If there are 10000 jobs to be run, then it would requires 10000 checkins and 10000 checkouts for marking the jobs in progress and release the connections back to the pool. The benefit is the connection is available to be used by AR.

Instead, if the working thread holds on to the dedicated connection, then it only check out once, finishes 10000 jobs, then release back the connection. The downside is that during the running of 10000 jobs, the connection is reserved and not available for AR use.

Perhaps, I'd introduce a configurable option so the user can select which mode to use: performance or connection resource

@vtt
Copy link

vtt commented Mar 17, 2021

@bensheldon I apogolize for hijacking your thread; but have you address the problem with adivsory locks approach in good_job when a job takes 2 days to process; half way the database server crash and restarted; the job is still running in background.

The other job workers/threads reconnect to the freshly restarted database with no locks in memory and found the job "available". It dispatches another thread/worker to work on the job and now the job is being executed twice....

@vtt
Copy link

vtt commented Mar 17, 2021

Regarding the orphaned job issue with the introduction of running, this would be treated like a failed job and they will automatically retry at next application restart (see e41a80d)

@bensheldon
Copy link
Owner

@vtt no worries about where this thread goes; I just wanted to recognize it if someone else arrives (including myself at some future time).

A problem I see with vtt/skiplock@e41a80d is that someone starts a 2nd skiplock process (e.g. spins up another worker container), it will unlock the jobs on the first skiplock process.

GoodJob currently uses a session-level Advisory Lock, which has the benefit of not requiring a transaction, and being implicitly unlocked if worker disconnects.

But I'm also considering a SKIP LOCKED approach instead of Advisory Locks to better support PgBouncer. The strategy I'm considering on GoodJob (if I do move to a SKIP LOCKED approach) involves having a worker-heartbeat table and if a worker does not check-in in a certain amount of time, any jobs locked by that worker (also tracked on the job) are unlocked.

@jrochkind
Copy link
Contributor

One ancillary benefit of a worker heartbeat table is it could perhaps also support an admin interface which shows how many workers are active, and perhaps what they are doing. Resque admin gives you that information; I think sidekiq does too but not certain.

@vtt
Copy link

vtt commented Mar 18, 2021

@bensheldon please correct me if I get this wrong, but the session-level advisory lock is held in PostgreSQL memory (not on Rails side); and if the PG server crashes then restarts, all previous session-level advisory locks are cleared; so if you had a running worker processing a long job (eg. calculating some hypothetical bank payments), and the restart of database server is not known to the worker as it's working the job code. How do you prevent other workers/threads from fetching the same unfinished job that is now no longer locked by the session-level advisory lock due to restart of DB server... Wouldn't you then have two workers running the same job (ie: doing it more than once)?

As for someone deliberating starting a 2nd worker process manually; this is sabotage; they may as well go wipe the data from the database. Even so, you can add a few lines in the startup script to prevent more than one instance of the master process running at the same time (eg. by checking the PID file to see if process is still running then exit).

@bensheldon
Copy link
Owner

@vtt oh! the DB crashing is a really good situation I haven't considered. I'd assume that's pretty unexpected, but I'm not quite sure if AR would crash out or not in that instance.

Regarding multiple workers, that's something GoodJob explicitly focused on to support containerized environments and horizontal scaling.

Also, because writing on the Internet is hard, I am really glad you're in this space, and I'm not trying to poke holes in skiplock. Mostly just hoping you can contribute some ideas to GoodJob (this Issue itself is a lot of "why is this harder than it looks?" even before you arrived) and talking through this all is helpful for me 😊

@vtt
Copy link

vtt commented Mar 18, 2021

@bensheldon PG restarting is not rare at all; it happens every now and then even on top paid services like AWS RDMBS; it happens because of scheduled maintenance, upgrades or even just a pure crash. From Rails and AR side, it will report exceptions (lost connection to database) and keep trying to reconnect and continue on as usual. This usually lasts for 10-30 seconds (typical restart of PostgreSQL sever process); so you have to design good_job to withstand these scenarios.

In this situation, the only way to guarantee atomic execution of a job is to mark the Job record as "in progress "; so that no other process or thread can touch it. This can lead to orphaned jobs if the worker thread/process crashes or gets killed; but then these orphaned jobs should be treated just like any failed job (since it wasn't completed successfully) for retry on next master process restart.

If a worker has died abnormally by crashing or killed then exceptions be generated and reported; and a pending restart of the master process is needed to resume normal operation; so the orphaned jobs will be retried then.

@vtt
Copy link

vtt commented Mar 18, 2021

If you are supporting horizontal scaling (ie: running multiple master processes on multiple machines), then you'd likely need to mark the jobs in progress tied to the host (eg. hostname or IP address). Instead of using a boolean for running, a string can be used to identify the host and only allow that host to reset orphaned jobs belonging to that host.... Just an idea...

@sandstrom
Copy link

If this isn't an issue, maybe we could close or move to discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants