Sync error handling #133

Zacholme7 · 2025-02-11T15:28:02Z

Issue Addressed

Proposed Changes

This PR introduces a way to handle rpc errors and signal if sync is stalled.

If a websocket goes down or an rpc endpoint is having issues,OPERATIONAL_STATUS is set to false. The rest of the application can be conditioned on this value to determine if the execution layer is having sync issues.

If there is an rpc error, there is nothing we can do until the endpoint is operational again. The simplest way to test this is just to continuously poll for a block number with exponential backoffs. There is no longer a set number of retries. It just keeps retrying until it is valid again.

anchor/eth/src/sync.rs

dknopik · 2025-02-14T14:25:31Z

I retested it locally. Could not get it to crash now. Nice!

But I noticed that right now, this error condition does not properly apply any backoff:

anchor/anchor/eth/src/sync.rs

Lines 413 to 414 in b5ffff4

    
           // If we get here, the stream ended (likely due to disconnect) 
        
           error!("WebSocket stream ended, reconnecting...");

This got me thinking once more about the approach. How do you feel about, instead of having to apply troubleshoot_rpc at multiple appropriate locations throughout the file, just returning an error in the live and historical functions and doing the reconnect logic at the top level? This also handles cases where we want to fall back to historical sync logic after being offline for a considerable amount of time, and helps us assert that we really do not crash out anymore if something fails. There might be some disadvantages I am not thinking of right now though.

Zacholme7 · 2025-02-14T15:56:39Z

@dknopik Threw together a quick POC. I think I that approach works very nice. Take a look when you have a sec and if thats along the lines of what you were thinking ill clean it up and make sure it works.

dknopik · 2025-02-14T16:31:31Z

anchor/eth/execution.rs

-        .await
-        .expect("Failed to construct event syncer");
+    let mut event_syncer =
+        SsvEventSyncer::new(db.clone(), config, Arc::new(AtomicBool::new(false)))


I feel like it would be bit cleaner to create the Arc within the new method.

dknopik · 2025-02-14T16:36:43Z

Threw together a quick POC. I think I that approach works very nice. Take a look when you have a sec and if thats along the lines of what you were thinking ill clean it up and make sure it works.

yea, seems good! This is what I meant. :)

Zacholme7 added 7 commits February 10, 2025 22:54

naive backoff

282e051

conn troubleshooting

b26a93a

handle error

1a3e0b9

get rid of max retry

c48c84b

fix comment

d9f00b2

make operational status pub

a369afb

Merge branch 'unstable' into sync-error-handling

c8b0b63

Zacholme7 added enhancement New feature or request execution layer labels Feb 11, 2025

Zacholme7 added 2 commits February 11, 2025 15:32

sort

e3e2cba

Merge branch 'unstable' into sync-error-handling

a8310dc

dknopik reviewed Feb 12, 2025

View reviewed changes

anchor/eth/src/sync.rs Outdated Show resolved Hide resolved

Zacholme7 added 3 commits February 12, 2025 15:00

Merge branch 'unstable' into sync-error-handling

ef123d3

move to atomic bool

b5ffff4

Merge branch 'unstable' into sync-error-handling

2e4d758

Zacholme7 marked this pull request as ready for review February 14, 2025 13:22

move retry to top layer

ca93eff

fmt

9c84266

dknopik reviewed Feb 14, 2025

View reviewed changes

Merge branch 'unstable' into sync-error-handling

33ca981

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync error handling #133

Sync error handling #133

Zacholme7 commented Feb 11, 2025

dknopik commented Feb 14, 2025

Zacholme7 commented Feb 14, 2025

dknopik Feb 14, 2025

dknopik commented Feb 14, 2025

Sync error handling #133

Are you sure you want to change the base?

Sync error handling #133

Conversation

Zacholme7 commented Feb 11, 2025

Issue Addressed

Proposed Changes

dknopik commented Feb 14, 2025

Zacholme7 commented Feb 14, 2025

dknopik Feb 14, 2025

Choose a reason for hiding this comment

dknopik commented Feb 14, 2025