Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exponential backoff to websocket client #429

Open
bgins opened this issue Nov 7, 2024 · 0 comments · May be fixed by #450
Open

Add exponential backoff to websocket client #429

bgins opened this issue Nov 7, 2024 · 0 comments · May be fixed by #450

Comments

@bgins
Copy link
Contributor

bgins commented Nov 7, 2024

General Description

Implement exponential backoff for the websocket client.

Which system(s) or functionality does this affect

This change affects:

  • Job Creation
  • Resource Provider
  • Mediation
  • Solver

Describe the changes, and how this affects/ interacts with each system.

Our existing websocket client has a retry strategy where it attempts to connect every two seconds:

connectFactory := func() *websocket.Conn {
for {
log.Debug().Msgf("WebSocket connection connecting: %s", url)
conn, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Error().Msgf("WebSocket connection failed: %s\nReconnecting in 2 seconds...", err)
time.Sleep(2 * time.Second)
continue
}
conn.SetPongHandler(nil)
return conn
}
}

The repeated connection attempts can flood the solver with connection attempts at scale.

Let's implement exponential backoff when making the connection. We could start with a base interval of one second and an exponent of 1.5. Using an exponential backoff calculator, this would look like:

In addition, we should set a maximum number of attempts before giving up and exiting the program. We should exit with a clear error message so the user can report an issue connecting to us.


The job creator, resource provider, and mediator connect to the solver over a websocket. This feature impacts the communication between each client and the solver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants