Replies: 2 comments 6 replies
-
@til Thanks for asking and also for digging into the code. My assumption with GoodJob is that it is run in a containerized environment that orchestrates similar to what you laid out with Fargate:
It's also possible for GoodJob to abort active threads after receiving a TERM, which can be configured with the shutdown timeout (the shutdown timeout is disabled by default). For example, in my Heroku example, I might set that timeout to 25 seconds so that GoodJob abort threads and exits before being killed by the supervisor. Can you say more about the kinds of jobs you're running that 2 minutes wouldn't be sufficient? I'm not aware of other job systems that have some other shutdown mechanism than signals; please share if you're inspired by another example. I'm wondering if exposing a cancellation would be sufficient for you to exit long-running jobs. e.g. class ReallyBigJob < ApplicationJob
def perform
lots_of_items.each do |item|
break unless GoodJob.running? # or `if GoodJob.shutting_down? || GoodJob.shutdown?`
# do some processing
end
end
end |
Beta Was this translation helpful? Give feedback.
-
Thanks for your feedback (and for good_job of course)! The particular use case I need to support is a system where users can request the creation of data exports for a date range. The exports can become quite large, up to a few hundred megabytes, and involve long running database queries, scanning and aggregating data from AWS S3, and requests to external APIs. The end result is a large single file on S3 that the user can then download, e.g. a gzip compressed CSV. The processing itself is implemented in a streaming fashion, so it doesn't exhaust memory – the runtime simply scales linearly with the size. The users are OK with a certain wait time depending on the expected size. There is feedback while the job runs to inform users about the progress. Changing this particular feature to be able to interrupted and resume would make it significantly more complicated than it is now I think. Also because compatibility of deployments would be an issue: it's desired that also a larger running job is processed with only the deployed version of the code it started with, instead of potentially changing its behavior mid-way. It is certainly a rare use case, I agree that in most cases having many smaller jobs is the better approach. However I could imagine there are a few other similar use cases when processing large files are involved, e.g. video encoding. Sidekiq Pro has a feature called quiet: https://rewind.com/blog/controlling-sidekiq-workers-on-aws-with-fargate-and-ssm-commands/ which I found searching for how to send signals on Fargate. GoodJob's behavior sounds even more useful to me. If I understand correctly,
when the timeout is disabled or very long, that means sending the good_job cli process a Then the only problem remains, how to get the signal to the process in an environment that doesn't support sending signals directly. I therefore don't think, exposing a cancellation would help, or maybe I didn't understand you correctly. |
Beta Was this translation helpful? Give feedback.
-
Hi there,
When running good_job with
execution_mode = :external
withbundle exec good_job start
commands in docker containers, what would be a recommended approach to gracefully shutdown the worker process – so that its threads that are currently processing jobs finish those jobs, also if those jobs potentially still take a long time to finish, but don't start processing any new jobs, and once all threads are finished, the process terminates automatically?It looks to me like sending an INT or TERM signal to the process would achieve this. There is a
trap
call (here: https://github.com/bensheldon/good_job/blob/main/lib/good_job/cli.rb#L106-L108) that sets@stop_good_job_executable
. I haven't found any explicit mention in the documentation though, so I'm wondering if this is the intended way.If so, I'd still need to find a way to do this in our particular environment, AWS Fargate, which as far as I see doesn't allow to send signals to running tasks. One can only stop running tasks, which will send a TERM signal, wait for a maximum of 2 minutes, and then forcibly shut them down, which is way too short unfortunately. I think sending a signal to its own process from an ActiveJob callback that checks some custom condition might work.
Are there any other approaches? E.g. would it theoretically be possible to publish such a shutdown request with NOTIFY? There is some code mentioning shutdown in https://github.com/bensheldon/good_job/blob/main/lib/good_job/notifier.rb, but I don't understand yet what it does - is it maybe only relevant for
execution_mode = :async
?Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions