Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Allowance for Retry Jitter #11

Open
erikreedstrom opened this issue Apr 26, 2019 · 2 comments
Open

[RFC] Allowance for Retry Jitter #11

erikreedstrom opened this issue Apr 26, 2019 · 2 comments

Comments

@erikreedstrom
Copy link

Fixed retry periods can lead to a thundering herd problem.

Rather than holding a fixed set of delays, it would be useful to include a calculation function, which could build in jitter. Something along the lines of:

@initial_retry_interval 15_000
@max_retry_interval 900_000

def retry_interval(failed_count) do
  # Apply exponential backoff with `initial_retry_interval` on the
  # number of attempts so far as inputs. Do not allow the number to exceed
  # `max_retry_interval`
  sleep_millis = min(@initial_retry_interval * :math.pow(2, failed_count - 1), @max_retry_interval)

  # Apply some jitter by randomizing the value in the range of `sleep_millis / 2` to `sleep_millis`
  # For instance, if sleep seconds are 15, we will actually respond at some time between 7.5s and 15s
  sleep_millis = sleep_millis * (0.5 * (1 + :rand.uniform()))

  # But never sleep less than the base sleep seconds
  max(@initial_retry_interval, round(sleep_millis))
end

Thoughts?

@erikreedstrom
Copy link
Author

Perhaps this doesn't align with the current system of queue per delay? What is the benefit of a delay specific queue over using an x-delay header?

@erikreedstrom
Copy link
Author

erikreedstrom commented Apr 26, 2019

Given a rigidity to the current architecture, perhaps a jitter factor could be worked into the expiration?

config :roger,
  retry_levels: [15, 30, 60, 120, 240, 480, 960],
  jitter?: true
def retry(channel, partition, job) do
  {queue, expiration} = setup_retry_queue(channel, partition, job)

  expiration =
    if Application.get_env(:roger, :jitter?) do
      # Apply some jitter by randomizing the value in the range of `expiration / 2` to `expiration`
      # For instance, if expiration is 15, we will actually respond at some time between 7.5s and 15s
      expiration * (0.5 * (1 + :rand.uniform()))
    else
      expiration
    end

  payload = Job.encode(%Job{job | retry_count: job.retry_count + 1})
  opts_extra = case expiration do
    :buried -> []
    _ -> [expiration: Integer.to_string(round(expiration * 1000))]
  end

  AMQP.Basic.publish(channel, "", queue, payload, Job.publish_opts(job, partition) ++ opts_extra)
  {:ok, expiration}
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant