-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea / feature request - deferring execution of batch jobs while still writing them to the database #1319
Comments
Thanks for opening this issue. I think I like the general idea of it. Here's an idea that I think could accomplish this without hugely changing things: what if "pausing" a job was accomplished by setting the It was waaaaay back at GoodJob v0.7.0 when GoodJob started always assigning a So the implication would be that a job that is not "scheduled" is not intended to run. I think the only wrinkle would be asking: when the job is intended to be undeferred/unpaused we'd want to do something like: job.scheduled_at = serialized_params["scheduled_at"] && serialized_params["scheduled_at"] > Time.current ? serialized_params["scheduled_at"] : Time.current ...which is a little gnarly but probably ok. What do you think? |
I'd translate that logic into a SQL update we could run within the database (unless we need to load all the records in rails for the notifiers to work?), but other than that it sounds great. |
Thinking about this some more, you can add jobs to a batch, and load the batch elsewhere, and the final callbacks won't be triggered until the batch is enqueued. Eg batch = GoodJob::Batch.new
batch.add { MyJob.perform_later }
same_batch = GoodJob::Batch.find(batch.id)
same_batch.add { OtherJob.perform_later }
same_batch.enqueue # <= now finish callback is enabled That might not totally work for your needs, but I didn't think before that "enqueued" is a batch status that's slightly similar. |
I think this would be useful for queueing huge volumes of jobs, when combined with concurrency limits this would allow the process queueing the jobs to fail and restart without duplicate jobs being queued. The same mechanism would also allow multiple processes to add jobs to the same batch while avoiding duplicates (for some hypothetical where you have multiple datasets containing duplicates). Our specific use case is for queueing the generation/delivery of large email blasts, where we don't want to send duplicate emails to the same user if the job-spawner process dies for some reason.
For implementation, I think this would be accomplished by adding a boolean 'deferred' column to either the jobs or batches tables. Adding it to the jobs table would require a ton of record updates when marking a batch to run (something like
UPDATE good_jobs SET deferred = false WHERE batch_id = xyz
), but avoids complicating the jobs-to-run query with a join or subquery. Adding it to the batches table instead would make the updates much easier, but would require the aforementioned join/subquery. I think the adding the column to the jobs table is likely the better solution here.Is this something you'd be interested in adding to the project, and/or do you have any thoughts/recommendations/requests? This seems straightforward enough so I'll likely take a crack at implementing this myself.
This is potentially related to #919, the pause state could be implemented by flipping deferred back to true for all pending jobs
The text was updated successfully, but these errors were encountered: