Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cron double-enqueue because delay close to 0.01 and possibly clock-drift #1543

Merged
merged 4 commits into from
Nov 21, 2024

Conversation

ccouton
Copy link
Contributor

@ccouton ccouton commented Nov 15, 2024

Fixes one problem in #1536

Case of clock drift and delay calculation result close to 0.001 so task is executed immediately which causes an unnecessary duplicate key value violates unique constraint-error even on single-process installations.

@bensheldon bensheldon changed the title fix(good_job-1536): case of clock drift and delay close to 0.001 Fix cron double-enqueue because of clock drift and delay close to 0.001 Nov 15, 2024
@ccouton
Copy link
Contributor Author

ccouton commented Nov 15, 2024

@bensheldon thanks for your help.
If we have 0.02 it will be the same no? it will execute immediately after ?

@bensheldon bensheldon changed the title Fix cron double-enqueue because of clock drift and delay close to 0.001 Fix cron double-enqueue because of clock drift and delay close to 0.01 Nov 15, 2024
@bensheldon bensheldon changed the title Fix cron double-enqueue because of clock drift and delay close to 0.01 Fix cron double-enqueue because delay close to 0.01 and possibly clock-drift Nov 15, 2024
@bensheldon
Copy link
Owner

@ccouton doublecheck my math, but I believe this line should prevent Concurrent Ruby from ever immediately running the task:

https://github.com/bensheldon/good_job/pull/1543/files#diff-66b4fe9e81a6b387c63fb9fbea11ffd29405422be569fee3e4e125e647b820e1R92-R94

@ccouton
Copy link
Contributor Author

ccouton commented Nov 15, 2024

Yes but it will be scheludled 0.02s after so a risk to be before the next at and so at the same of the previous at

@bensheldon
Copy link
Owner

oh right! we do want to run it immediately if it is in fact in the past. I'll tweak that.

@ccouton
Copy link
Contributor Author

ccouton commented Nov 15, 2024

Sorry i'm not clear enough, we have the case when clock drift cause an execution just before the real next scheduled at. Here i think this issue is not fixed. And maybe not easy to fix. We have ti execute it immeditaly if it's the first execution but if it's an other if cron_at is leather than current for me there is no reason to execute it?

@bensheldon
Copy link
Owner

bensheldon commented Nov 15, 2024

@ccouton this should (theoretically, assuming I did it right) prevent the duplicate key value violates unique constraint error you shared in the previous example. The duplicate key value/column uses the scheduled/intended time (so on 60.0) rather than on the actual time (maybe 60.0000023). So if it is scheduled to run at 60.0 but it actually runs at 61.000035 and the next run is at 61.0 you will not get a duplicate key error.

I don't have a strategy for preventing the case where some delay causes a previously scheduled time to run at about the same time as the next scheduled time. I would say that if your schedule is at such a cadence that you frequently run into that (like every second), cron is maybe the wrong tool for that and it would be better to manage your own sleep loop.

@ccouton
Copy link
Contributor Author

ccouton commented Nov 15, 2024

maybe we can compare at the entry of the scheduled task that the thr_cron_at and previously_at are not equal ? and if it's the case we don't enqueue and reschedule the task again? wdyt?

@ccouton
Copy link
Contributor Author

ccouton commented Nov 15, 2024

I tested it locally in my test.rb and it solves the issue

  def create_task(previous_cron_at = nil)
    cron_at = @fugit.next_time.to_t
    puts Time.now.to_f
    delay = [(cron_at - Time.now).to_f, 0].max
    puts "delay = #{delay}"

    future = Concurrent::ScheduledTask.new(delay, args: [self, cron_at, previous_cron_at]) do |tf, cron_at, previous_cron_at|
      puts "previous_cron_at / cron_at (#{previous_cron_at} / #{cron_at})"
      if previous_cron_at == cron_at
        puts "!!!! previous_cron_at == cron_at !!!! (#{previous_cron_at} #{cron_at})"
        tf.create_task(previous_cron_at)
      else
        tf.create_task(cron_at)
        puts "Exec task #{Time.now.to_f}"
      end
    end

    future.execute
  end

@bensheldon
Copy link
Owner

I don't like guarding against previous_cron_at == cron_at because that's not the same thing as accounting for Concurrent Ruby's < 0.1 second execution behavior or for clock drift. And it's a behavior change: it's possible for someone to use a Proc instead of Fugit and while I think it's wild to provide the same time twice, I don't want to change the contract around what runs.

I wanted to ask, do you not like what I've proposed:

  • Because it doesn't work?
  • Or because it's too complex/ugly/confusing?

This is the version of your script with my strategy (though I have still haven't been able to reproduce the behavior you're seeing when I run the script):

require 'concurrent-ruby'
require 'time'
require 'fugit'

class TestFugit
  def initialize
    @fugit = Fugit.parse('*/1 * * * *')
  end

  def create_task(at: nil)
    cron_at = at || @fugit.next_time.to_t

    puts Time.now.to_f
    delay = cron_at <= Time.now ? 0.0 : [(cron_at - Time.now).to_f, 0.02].max
    puts "delay = #{delay}"

    future = Concurrent::ScheduledTask.new(delay, args: [self, cron_at]) do |tf, thr_cron_at|
      if thr_cron_at > Time.now
        puts "Too soon, rescheduling"
        tf.create_task(at: thr_cron_at)
      else
        tf.create_task
        puts "Exec task #{Time.now.to_f}"
      end
    end

    future.execute
  end
end
TestFugit.new.create_task

sleep 1000

@ccouton
Copy link
Contributor Author

ccouton commented Nov 18, 2024

seems working ok for that solution

@ccouton
Copy link
Contributor Author

ccouton commented Nov 20, 2024

when you will merge it?

@bensheldon bensheldon added the bug Something isn't working label Nov 21, 2024
@bensheldon bensheldon merged commit c102011 into bensheldon:main Nov 21, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging this pull request may close these issues.

2 participants