Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify deadline passing #2222

Merged
merged 6 commits into from
Dec 30, 2023
Merged

Simplify deadline passing #2222

merged 6 commits into from
Dec 30, 2023

Conversation

fleupold
Copy link
Contributor

Description

There seems to be an issue with the deadline we are passing down to solvers leading to timeouts while still having significant time left for postprocessing. Take Laertes in barn auction 8222857 for instance:

2023-12-29T09:37:47.818Z DEBUG request{id="2710"}:/solve{solver=laertes-solve auction_id=8222857}: driver::infra::observe: computed deadline deadline=Deadline { driver: 2023-12-29T09:38:02.090509218Z, solvers: 2023-12-29T09:38:00.663289792Z } timeouts=Timeouts { http_delay: Duration { secs: 0, nanos: 500000000 }, solving_share_of_deadline: Percent(0.9) }
2023-12-29T09:37:49.476Z TRACE request{id="2710"}:/solve: solvers::api::routes::solve: auction=Auction { id: Solve(8222857), ... deadline: Deadline(2023-12-29T09:37:58.702187297Z}
...
2023-12-29T09:37:58.738Z WARN request{id="2710"}:/solve:auction{id=8222857}: solvers::domain::solver::legacy: failed to solve auction err=Timeout
2023-12-29T09:37:58.739Z DEBUG request{id="2710"}:/solve{solver=laertes-solve auction_id=8222857}: driver::infra::observe: postprocessing solutions solutions=0 remaining=3.351353404s

Here we aborted solver computation at 09:37:58 despite the original solver deadline (first log) being almost two seconds later (09:38:00). We can see that the deadline that is received by the solver engine is already much smaller than what we computed in the driver. Looking at the code we expect a reduction of http_delay (0.5s) but not 2s.

One thing to note is that the way we pass down solver deadlines is surprisingly 🤡. We convert it into a duration to later on convert it into a timestamp again. My hunch is that this is causing us to lose precision and thus time. This PR simplifies this logic and hopes that this will resolve the 2s time loss.

Changes

  • Remove SolverTimeout type and argument being passed around in favour of the timeouts that are already part of the auciton
  • Return DateTime instead of durations for solver/driver deadlines
  • Move remaining helper method into an extension trait to allow it being used where needed
  • Remove the 500ms http-delay reduction in the request to the solver.

We already have a buffer for postprocessing in the driver, and really it should be the consumer (solver engine in this case) who adjusts the deadline to take network latency into account. We do the same for the autopilot<>driver deadline (the driver attempts to send their response 500ms before the deadline) and in fact also already account for another 1s buffer inside the legacy solver-engine (code)

How to test

Existing tests

Related Issues

Fixes #2211

@fleupold fleupold requested a review from a team as a code owner December 29, 2023 10:34
Copy link
Contributor

@sunce86 sunce86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you removed timeout.reduce(self.config.timeouts.http_delay) I think you need to implement that on solvers side otherwise baseline and dex solvers will hit the deadline every time.

@fleupold
Copy link
Contributor Author

I think you need to implement that on solvers side otherwise baseline and dex solvers will hit the deadline every time.

Good point! Opened https://github.com/cowprotocol/solvers/pull/8 and will add it for baseline here.

Copy link
Contributor

@sunce86 sunce86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we have 1s reduce time for external solvers in http_solver, I believe we should lower the timeout for them also as we did for baseline/dex, because if external solvers do not respect time_limit it means that the timeout error will not get properly propagated back to the autopilot, both driver and autopilot will time out at the same time - while autopilot expects no solutions in this case.

Each timeout error received by autopilot should mean we have a bug.

Edit: I think I am wrong, we will have leftover time in driver to process the response back. All good.

Copy link
Contributor

@sunce86 sunce86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change. Really happy you killed the SolverTimeout.

@fleupold fleupold enabled auto-merge (squash) December 30, 2023 08:12
@fleupold fleupold merged commit 9147f3e into main Dec 30, 2023
8 checks passed
@fleupold fleupold deleted the simplify_deadline_passing branch December 30, 2023 08:15
@github-actions github-actions bot locked and limited conversation to collaborators Dec 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: SeaSolver timeout
2 participants