-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some jobs don't start immediately after scheduled - LISTEN/NOTIFY issue? #1419
Comments
That's interesting/unexpected! Could you share also the created_at of the jobs too in your table? I wanted to see if these were scheduled to run in the future or not. Also, could you query Just for some background: when in async, the process that enqueues the job will try to execute it locally, if there isn't local capacity (any available execution threads), it will then emit a NOTIFY to other LISTENing processes. And then, if the job is scheduled in the future, those processes will put the future-scheduled time in a cache (this is a Concurrent Ruby thing) and the process will awaken at (or around) that time and try to execute a job (it checks for the next available job rather than specifically that job) |
Thank you for the quick response! Yup all jobs where not scheduled in the future and I've confirmed that the created_at is the same as scheduled_at (first column)
This is the result for GoodJob::Execution (looks exactly the same)
So if I'm understanding to your explanation there shouldn't actually be any NOTIFY given that no other job is running, and there should be capacity to run it locally? Is there anything else that I can check to see if there should be capacity or is it just the GoodJob::Execution table? also I'm on GJ 3.30 (I know you made some changes recently!) if it did start we'd see executions_count be greater then 1 right? (it is 1, see below) - so does it mean it didn't execute at all locally then was picked up by the poll interval? Also confirmed that I dont see any errors during that time on bugsnag (unless its something that is so bad that it doesn't even notify)
Also if I understand correctly, a workaround would be to try to run it as "external" to use NOTIFY instead of running locally on the web server, and hope it works in case the issue is the "running locally" part? |
Thanks for sharing those. That implies that yes, for some unknown reason jobs are not being run in a timely manner. I would say the quickest option for you to reduce queue latency would be to try reducing your poll interval to 5 seconds, and see if that changes the timing you see. For GoodJob generally: I'm... not sure. I'm assuming:
Assuming you've applied all the migrations, GoodJob v3.30 is pretty equivalent to 4.0.0; For 3.30 you want to be using But I'm stumped tbh |
I have been testing GJ for a queue of infrequent jobs, each job taking ~20 seconds to complete.
Most jobs are starting immediately, but sometimes there's a few seconds delay - which I assume means that the LISTEN/NOTIFY failed to notify a worker and the job was picked with the polling interval.
Queue has at most 5 jobs running concurrently (most of the time just 1 job running concurrently)
Attached below is an example of 9 jobs in a span of 13 minutes:
Jobs 1-4, 6-8 started immediately
Jobs 5,9 started ~5 seconds after scheduled, and there are no other jobs running at these times
Other info
GoodJob is configured as async with poll_interval of 10 (so ~5 seconds delay is suspiciously in the middle!), every other parameter is default.
Running on heroku performance dyno and PUMA with WEB_CONCURRENCY=4
I've changed puma.rb according to
https://github.com/bensheldon/good_job?tab=readme-ov-file#execute-jobs-async--in-process
There wasn't any meaningful load on the web server at the time.
Any idea what can be the issue causing the jobs to not start immediately?
Thank you ❤️
The text was updated successfully, but these errors were encountered: