Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency's enqueue_limit should exclude performing jobs from count #317

Closed
vfonic opened this issue Aug 1, 2021 · 9 comments · Fixed by #342
Closed

Concurrency's enqueue_limit should exclude performing jobs from count #317

vfonic opened this issue Aug 1, 2021 · 9 comments · Fixed by #342
Labels
enhancement New feature or request

Comments

@vfonic
Copy link

vfonic commented Aug 1, 2021

Hey Ben!

Thanks for the great gem! I just discovered it and it seems to be an improvement on top of DelayedJob which I'm using.

Currently, I have some custom duplication-prevention code that prevents adding a job with the same arguments to the queue if such job (with same arguments) already exists in the queue and is not currently running. (If a job with the same arguments is currently running, we allow adding a job to the queue.)

Is there any such solution for GoodJob? How would I go about adding support for this myself?

Thanks!

@bensheldon
Copy link
Owner

Yep! GoodJob's ActiveJob Concurrency extension can do this with enqueue_limit. But it does not do the "and is not currently running" as enqueue_limit currently includes executing jobs.

It's technically possible to exclude executing jobs. Could you make that argument? I'm open to changing it.

If you wanted to monkeypatch your own use instead, chain in this line to include .advisory_unlocked scope:

enqueue_concurrency = GoodJob::Job.unscoped.where(concurrency_key: key).unfinished.count

And then:

class MyJob < ApplicationJob
  include GoodJob::ActiveJobExtensions::Concurrency

  good_job_control_concurrency_with enqueue_limit: 1, key: -> { "Unique-#{arguments.first}" }

  def perform(first_name)
    # do work
  end
end

FYI, You can test the resulting key that is generated this way (I should add this to the readme):

job = MyJob.perform_later("Something")
job.good_job_concurrency_key #=> "Unique-Something"

@bensheldon
Copy link
Owner

By the way, I tried to follow the semantics of this thread upstream on Rails. I'd want to stay in alignment with that proposal, but I don't think they specified this: rails/rails#40337

@vfonic
Copy link
Author

vfonic commented Aug 2, 2021

Ah, I missed the "concurrency" part in the Readme. I searched for "duplicate" and "double" and figured there's nothing mentioning duplicate jobs. It's interesting that at Shopify/Rails they also chose to use the word "concurrency", so I guess there's nothing to change in this gem. Perhaps we could only add a hint that this could be used to prevent enqueueing the same job twice.

It's technically possible to exclude executing jobs. Could you make that argument? I'm open to changing it.

This is a race condition.
Imagine you have a job that takes a couple of seconds to run. It's used to save the record JSON to a third-party API after every update.

  1. First, it fetches a record from the database
  2. Then it pushes the record JSON to a slow third-party API

While this is happening, after 1) and before/during 2), user makes changes to the record and saves the record.

Expected:

The new record changes get synced to a third-party API.

Actual:

Concurrency algorithm finds the currently executing job with the same arguments and decides to skip the enqueuing of the job. The new record changes don't get synced.

I guess I should give it a try and let you know how it all works.

@bensheldon
Copy link
Owner

This is a race condition.

That's a compelling reason.

And I remembered the reason why I initially counted executing jobs: retrying on errors. Example: job1 is executing, job2 is enqueued and the constraint of enqueue_limit: 1 is preserved. If job1 errors and is attempted to be retried, it will hit the constraint that prevents it from being re-enqueued. In this situation, what should happen?

  • job1 is re-enqueued, ignoring the constraint (2 matching jobs are now enqueued)
  • do not re-enqueue job1. How to preserve/indicate why when inspecting the job/Dashboard.

I lean towards the first case (re-enqueue and ignore the constraint), but I'm curious about your perspective.

@bensheldon
Copy link
Owner

FYI, I updated the Readme with additional keywords (duplicate, double) and the testing example 🎉

@vfonic
Copy link
Author

vfonic commented Aug 2, 2021

If job1 errors, I'd assume it goes back to the end of the queue. So job2 will start executing before job1.

For "sync" purposes, it's important that at least one job starts after the record has been last saved.

There might be some other cases where it's better to try to remove the duplication. I'd argue that delivering the same message twice is better than skipping executing a job because there's "the same job" already running.

This is an edge-case where I'd probably go for simplicity: use the same code that is already there. If job1 errors, it would be the best if it would be removed, if job2 started running after job1 errored. But, for the sake of simplicity, I'd allow job1 to be added to the queue if the check for duplicates says there are no duplicates (job2 has the same arguments, but it's currently executing)

@bensheldon
Copy link
Owner

There might be some other cases where it's better to try to remove the duplication. I'd argue that delivering the same message twice is better than skipping executing a job because there's "the same job" already running.

I agree with this. I think that's saying "running a job incompletely is more problematic than running a 2nd job with duplicate conditions."

To move forward, I'm thinking:

  • change enqueue_limit's count to exclude executing jobs
  • allow the enqueue_limit constraint to be ignored when retrying jobs, which can lead to there being multiple enqueued jobs with the same conditions.
  • Update the readme to note this, and recommend that if using an enqueue_limit: 1 to also set perform_limit: 1 to guard against a retried job executing at the same time as another job with duplicate conditions was enqueued.

@vfonic
Copy link
Author

vfonic commented Aug 3, 2021

Sounds perfect! Thanks @bensheldon! :)

I'm curious to see what the rails team will do with their concurrency solution.

@codyrobbins
Copy link
Contributor

When I first started playing around with the concurrency extension I was mildly surprised when I realized that enqueue_limit included performing jobs, but I don’t have a strong preference either way in terms of whether it should or not. I think you could make an argument for either option because you can achieve the same functionality either way, and the current behavior makes sense when you realize what’s going on, although excluding performing jobs has the advantage of avoiding violating the principle of least surprise.

For our use case we are running jobs that sync user data against external APIs in response to webhooks. We wanted to ensure that only one instance of the same sync job is ever running at the same time to avoid multiple simultaneous syncs interfering with each other and so we have perform_limit set to 1. If a change happens in the midst of an already running sync, though, we want to run another sync immediately after the first one completes in order to pick up those new changes. If multiple changes occur during a running sync, though, we never need to enqueue more than one additional sync job because that one sync job will pick up all subsequent changes when it runs after the current one. So, we have enqueue_limit set to 2.

This effectively serves as a kind of debounce on the trailing edge, since it ensures that we always pick up new changes when they happen without running multiple unnecessary syncs, but it doesn’t function across a time window. I can imagine that in the future with heavier load we may need a time-based throttle like that discussed in #315, though.

@bensheldon bensheldon changed the title Is there any support from preventing adding duplicate job to queue? Concurrency's enqueue_limit should exclude performing jobs from count Aug 16, 2021
@bensheldon bensheldon added the enhancement New feature or request label Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants