Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis waitgroup for optimistic block sync #538

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

austonst
Copy link
Contributor

@austonst austonst commented Oct 5, 2023

📝 Summary

This change implements a counter in redis which monitors the number of builder API processes currently handling validation of optimistic blocks for a given slot. Each slot has a separate key in redis, though as before only one slot is optimistically processed at a time. When a builder process begins optimistic validation, it INCR's the value, and once it is complete it DECR's the value. These are atomic redis operations so no mutex is needed. Functions can block by waiting on this value to reach 0, at which point all optimistic blocks for the slot have been validated.

This has been tested on Aestus's goerli relay, using builders marked for optimistic submissions and intentionally triggering demotions by disabling the block priority queue (causing all validation calls to timeout after the default 10s).

Caveats:

  • These redis reads/writes occur whether or not the relay has optimistic submissions enabled (but they are not in the critical path)
  • Blocking works by querying redis once every 50 ms. Slightly awkward, could be exposed as a configurable parameter, but not aware of a better solution.

⛱ Motivation and Context

See #534 . This is a revised version of the code originally linked there.

📚 References

Test log entries from a, Aestus Goerli block proposal, reverse chronological, newest at the top. Shows three builder API processes working on validations (each which take 10s). There's a lot going on here, but it does demonstrate correct behavior and I'm happy to walk through it elsewhere.
sync.txt


✅ I have run these commands

  • [✅] make lint
  • [✅] make test-race
  • [✅] go mod tidy
  • [✅] I have seen and agree to CONTRIBUTING.md

@austonst austonst force-pushed the sync-optimistic-redis branch from 7467e52 to 2128d05 Compare February 7, 2024 18:10
@austonst austonst force-pushed the sync-optimistic-redis branch from 2128d05 to 8396708 Compare February 7, 2024 18:26
@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2024

Codecov Report

Attention: Patch coverage is 14.45783% with 71 lines in your changes are missing coverage. Please review.

Project coverage is 30.19%. Comparing base (d8a0d7b) to head (d668875).
Report is 2 commits behind head on main.

Files Patch % Lines
datastore/redis.go 4.08% 47 Missing ⚠️
services/api/service.go 29.41% 21 Missing and 3 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
- Coverage   36.22%   30.19%   -6.03%     
==========================================
  Files          24       41      +17     
  Lines        5267     6345    +1078     
==========================================
+ Hits         1908     1916       +8     
- Misses       3112     4179    +1067     
- Partials      247      250       +3     
Flag Coverage Δ
unittests 30.19% <14.45%> (-6.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@metachris metachris requested a review from avalonche March 19, 2024 08:10
err = api.redis.BeginProcessingSlot(context.Background(), headSlot+1)
if err != nil {
api.log.WithError(err).Error("failed to lock redis optimistic processing slot")
api.optimisticSlot.Store(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the optimistic slot 0 if redis fails to process the slot instead of the previous head?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place api.optimisticSlot is used is in deciding whether or not to process a submission optimistically:

submission.BidTrace.Slot == api.optimisticSlot.Load() {

We only reach this BeginProcessingSlot call once we've received a head event for the previous slot and all optimistic blocks for that slot have have finished processing:

api.processNewSlot(headEvent.Slot)

So the only valid block submissions we receive at this point are those with submission.BidTrace.Slot == headSlot+1. Whether we run api.optimisticSlot.Store(0) or api.optimisticSlot.Store(prevHeadSlot), neither will match the submission.BidTrace.Slot of new incoming blocks. Either way, the end result is that if the redis connection fails, the process will not allow optimistic submissions for the upcoming slot.

So while there are a few options for code flow, the current design sets to headSlot + 1 by default, and resets to 0 as a general "disable all optimistic processing" in case of error.

// WaitForSlotComplete waits for a slot to be completed by all builder processes
func (r *RedisCache) WaitForSlotComplete(ctx context.Context, slot uint64) (err error) {
keyProcessingSlot := r.keyProcessingSlot(slot)
for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if there should be a timeout if EndProcessingSlot fails to decrement the processing slot and the loop is stuck waiting for the slot to complete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current failsafe is that the key is slot-specific and has a fairly short expiry. The loop will break on a value of 0, key expiry, or any other redis error. But it probably wouldn't hurt to have an explicit timeout on the mev-boost-relay side. I'll whip something up.

Also can consider keyspace events (https://redis.io/docs/manual/keyspace-notifications/) as a cleaner way to accomplish this, but might require some redis server config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants