Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds StartingSnapshotStorages to AccountsHashVerifier #58

Merged

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Mar 4, 2024

Problem

When starting up from fastboot, if shrink runs before taking the next bank snapshot, then snapshot storages can change. And then if the node restarts before taking the next bank snapshot, it may fail because the snapshot storages are now wrong.

Please see solana-labs#35376 for more information.

Summary of Changes

At startup, get the storages that were loaded from, for fastboot. Pass them into AccountsHashVerifier, because AHV is in charge of holding fastboot storages to prevent early cleanup.

I tested this with a node on mnb.

  1. I started a node and let it run for a while, so it took a bunch of snapshots
  2. I removed the code to purge all bank snapshots at startup with fastboot
  3. I restarted the node, and it started with fastboot correctly
  4. I inserted code to crash the node before the bank snapshot POST was reserialized (this ensures shrink runs)
  5. the node crashed, as expected
  6. I restarted the node, and it used the same snapshot for fastboot as in step (3)

And it worked! Previously, this would crash.

Fixes solana-labs#35376

@brooksprumo brooksprumo self-assigned this Mar 4, 2024
@brooksprumo brooksprumo marked this pull request as ready for review March 4, 2024 18:43
@jeffwashington
Copy link

I haven't taken the time to look deeply at every code change. I did skim it and I don't think this is addressed.
Is the issue with shrink due to recycling?
Recycling should only be possible if Arc::strong_count is =0.
Assuming recycling is the issue, could we:

  1. get rid of it?
  2. hold an Arc::strong_count on boot from fast boot and drop it when we create the first new fast_boot folder?

@brooksprumo
Copy link
Author

brooksprumo commented Mar 4, 2024

Is the issue with shrink due to recycling? Recycling should only be possible if Arc::strong_count is =0.

Yeah, it's from recycling. This was an issue for all of fastboot initially. The fix is with fastboot_storages:

// To support fastboot, we must ensure the storages used in the latest POST snapshot are
// not recycled nor removed early. Hold an Arc of their AppendVecs to prevent them from
// expiring.
let mut fastboot_storages = None;

Assuming recycling is the issue, could we:

  1. get rid of it?
  2. hold an Arc::strong_count on boot from fast boot and drop it when we create the first new fast_boot folder?

I don't know enough about recycling to do (1). Is it safe? Why was it added initially? Code spelunking sees to indicate that recreating the append vecs was an issue (maybe the underlying mmap?). Is that not an issue anymore?

And for (2), that's what this PR does. (Except you cannot drop the storages until the subsequent bank snapshot POST is made.)

@jeffwashington
Copy link

I don't know enough about recycling to do (1). Is it safe? Why was it added initially? Code spelunking sees to indicate that recreating the append vecs was an issue (maybe the underlying mmap?). Is that not an issue anymore?

it is a mysterious area of the code. We could probably use some more expertise in it. Originally append vecs were the way to store written accounts from tx processing. This model isn't used anymore. We would also use multiple append vecs per slot. Also not used anymore. Maybe there is still a place for recycling. I do know it is under suspicion so often and sometimes IS the reason some operation has a hole (like this one). It definitely adds complexity.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 42.85714% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 81.8%. Comparing base (3f9a7a5) to head (ee760bc).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #58   +/-   ##
=======================================
  Coverage    81.8%    81.8%           
=======================================
  Files         837      838    +1     
  Lines      225922   225949   +27     
=======================================
+ Hits       184955   184980   +25     
- Misses      40967    40969    +2     

@brooksprumo
Copy link
Author

it is a mysterious area of the code. We could probably use some more expertise in it. Originally append vecs were the way to store written accounts from tx processing. This model isn't used anymore. We would also use multiple append vecs per slot. Also not used anymore. Maybe there is still a place for recycling. I do know it is under suspicion so often and sometimes IS the reason some operation has a hole (like this one). It definitely adds complexity.

Ok, I'll take a look into this. And this PR may be the recycling genesis: solana-labs#12885

@@ -54,7 +56,11 @@ impl AccountsHashVerifier {
// To support fastboot, we must ensure the storages used in the latest POST snapshot are
// not recycled nor removed early. Hold an Arc of their AppendVecs to prevent them from
// expiring.
let mut fastboot_storages = None;
let mut fastboot_storages = match starting_snapshot_storages {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fastboot_storages already existed and is overwritten later in the existing code.

Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem reasonable to me, we cache an Arc of the storages we used to load our bank from. By doing so it prevents them from being shrunk so long as we hold the Arc. We had already done this for snapshots we were creating, and this PR makes startup more similar to other code.

We have some hidden argument for skipping (or not) shrinking/cleaning. It seems we will now skip shrinking unless we are loading from archives - do we need to add any sort of conflict with these args & fastboot?

@brooksprumo
Copy link
Author

We have some hidden argument for skipping (or not) shrinking/cleaning. It seems we will now skip shrinking unless we are loading from archives - do we need to add any sort of conflict with these args & fastboot?

I think we'll be OK. By holding these Arcs, we prevent them from getting recycled. When/if shrink runs (startup or otherwise), I believe it'll create a new append vec for the shrunk results, instead of recycling one of these append vecs. @jeffwashington does that sound right?

@jeffwashington
Copy link

it'll create a new append vec for the shrunk results, instead of recycling one of these append vecs.

this is correct. This change prevents recycling the append vecs used in fast boot.

@jeffwashington
Copy link

We have some hidden argument for skipping (or not) shrinking/cleaning.

this cli arg causes the validator to start first, then do the initial clean and then shrink in the background. The previous behavior was to wait to start the validator (replay, turbine) until the initial clean and shrink completed.

@brooksprumo brooksprumo merged commit 93f5b51 into anza-xyz:master Mar 4, 2024
35 checks passed
@brooksprumo brooksprumo deleted the fastboot/starting-snapshot-storages branch March 4, 2024 21:32
codebender828 pushed a commit to codebender828/agave that referenced this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fastboot fails if loading from the same snapshot multiple times
4 participants