-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis waitgroup for optimistic block sync #538
base: main
Are you sure you want to change the base?
Conversation
7467e52
to
2128d05
Compare
2128d05
to
8396708
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #538 +/- ##
==========================================
- Coverage 36.22% 30.19% -6.03%
==========================================
Files 24 41 +17
Lines 5267 6345 +1078
==========================================
+ Hits 1908 1916 +8
- Misses 3112 4179 +1067
- Partials 247 250 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
err = api.redis.BeginProcessingSlot(context.Background(), headSlot+1) | ||
if err != nil { | ||
api.log.WithError(err).Error("failed to lock redis optimistic processing slot") | ||
api.optimisticSlot.Store(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the optimistic slot 0 if redis fails to process the slot instead of the previous head?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place api.optimisticSlot is used is in deciding whether or not to process a submission optimistically:
mev-boost-relay/services/api/service.go
Line 2128 in d668875
submission.BidTrace.Slot == api.optimisticSlot.Load() { |
We only reach this BeginProcessingSlot
call once we've received a head event for the previous slot and all optimistic blocks for that slot have have finished processing:
mev-boost-relay/services/api/service.go
Line 477 in d668875
api.processNewSlot(headEvent.Slot) |
So the only valid block submissions we receive at this point are those with submission.BidTrace.Slot == headSlot+1
. Whether we run api.optimisticSlot.Store(0)
or api.optimisticSlot.Store(prevHeadSlot)
, neither will match the submission.BidTrace.Slot
of new incoming blocks. Either way, the end result is that if the redis connection fails, the process will not allow optimistic submissions for the upcoming slot.
So while there are a few options for code flow, the current design sets to headSlot + 1
by default, and resets to 0
as a general "disable all optimistic processing" in case of error.
// WaitForSlotComplete waits for a slot to be completed by all builder processes | ||
func (r *RedisCache) WaitForSlotComplete(ctx context.Context, slot uint64) (err error) { | ||
keyProcessingSlot := r.keyProcessingSlot(slot) | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if there should be a timeout if EndProcessingSlot
fails to decrement the processing slot and the loop is stuck waiting for the slot to complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current failsafe is that the key is slot-specific and has a fairly short expiry. The loop will break on a value of 0, key expiry, or any other redis error. But it probably wouldn't hurt to have an explicit timeout on the mev-boost-relay side. I'll whip something up.
Also can consider keyspace events (https://redis.io/docs/manual/keyspace-notifications/) as a cleaner way to accomplish this, but might require some redis server config.
📝 Summary
This change implements a counter in redis which monitors the number of builder API processes currently handling validation of optimistic blocks for a given slot. Each slot has a separate key in redis, though as before only one slot is optimistically processed at a time. When a builder process begins optimistic validation, it INCR's the value, and once it is complete it DECR's the value. These are atomic redis operations so no mutex is needed. Functions can block by waiting on this value to reach 0, at which point all optimistic blocks for the slot have been validated.
This has been tested on Aestus's goerli relay, using builders marked for optimistic submissions and intentionally triggering demotions by disabling the block priority queue (causing all validation calls to timeout after the default 10s).
Caveats:
⛱ Motivation and Context
See #534 . This is a revised version of the code originally linked there.
📚 References
Test log entries from a, Aestus Goerli block proposal, reverse chronological, newest at the top. Shows three builder API processes working on validations (each which take 10s). There's a lot going on here, but it does demonstrate correct behavior and I'm happy to walk through it elsewhere.
sync.txt
✅ I have run these commands
make lint
make test-race
go mod tidy
CONTRIBUTING.md