-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checks if bank snapshot is loadable before fastbooting #343
Checks if bank snapshot is loadable before fastbooting #343
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #343 +/- ##
========================================
Coverage 81.9% 81.9%
========================================
Files 837 837
Lines 226830 226979 +149
========================================
+ Hits 185856 186010 +154
+ Misses 40974 40969 -5 |
core/src/accounts_hash_verifier.rs
Outdated
_ => accounts_package.slot, | ||
}; | ||
snapshot_utils::write_fastboot_file(&snapshot_info.bank_snapshot_dir, fastboot_slot) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed this one. Let me fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Fixed in 3c2e77a.
ledger/src/bank_forks_utils.rs
Outdated
let Some(bank_snapshot) = latest_bank_snapshot else { | ||
// If we do *not* have a local snapshot to fastboot, then it must be because there was | ||
// *not* a loadable local snapshot AND we shall *not* startup from a snapshot archive. | ||
assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, a little hairy.
If we get inside this else
of the let-else
, then we are in a condition where (1) we're supposed to fastboot, and (2) there is no local state (i.e. a bank snapshot) to fastboot from. This should only happen if we've specified to never use a snapshot archive. So we assert that here.
If instead we used either when-newest
or always
for when to load from snapshot archives, then we should not be in this else
of the outer if-else
. Instead we should've set will_startup_from_snapshot_archives
to true
, since latest_bank_snapshot
would've been None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jeff here, it seems odd to me. We are basically assigning a bool, then having an assert we're assigning it correctly?
Line 250:
let will_startup_from_snapshot_archives = match process_options.use_snapshot_archives_at_startup
{
UseSnapshotArchivesAtStartup::Always => true,
UseSnapshotArchivesAtStartup::Never => false,
UseSnapshotArchivesAtStartup::WhenNewest => latest_bank_snapshot
.as_ref()
.map(|bank_snapshot| latest_snapshot_archive_slot > bank_snapshot.slot)
.unwrap_or(true),
};
let bank = if will_startup_from_snapshot_archives {
// ...stuff
} else {
// fastboot from local state
let Some(bank_snapshot) = latest_bank_snapshot else {
// If we do *not* have a local snapshot to fastboot, then it must be because there was
// *not* a loadable local snapshot AND we shall *not* startup from a snapshot archive.
assert_eq!(
process_options.use_snapshot_archives_at_startup,
UseSnapshotArchivesAtStartup::Never,
);
return Err(BankForksUtilsError::NoBankSnapshotDirectory {
flag: use_snapshot_archives_at_startup::cli::LONG_ARG.to_string(),
value: UseSnapshotArchivesAtStartup::Never.to_string(),
});
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively yes 😅 The assert is there to ensure we don't refactor something and cause the returned error message to be wrong/misleading.
(I do like your suggestion here #343 (review) too, so likely will go in that direction.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been refactored. Please refer to #343 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense but some of the naming could be more explicit.
I think we might do a small refactor to make some of this more clear. We have this odd pattern where we are getting highest loadable snapshot, doing match on how we start up, then an if based on the result of the previous match.
Can we do something like this?
let fastboot_snapshot = match process_options.use_snapshot_archives_at_startup {
UseSnapshotArchivesAtStartup::Always => None,
UseSnapshotArchivesAtStartup::Never => {
let fastboot_snapshot =
snapshot_utils::get_highest_loadable_bank_snapshot(snapshot_config);
if fastboot_snapshot.is_none() {
return Err(BankForksUtilsError::NoBankSnapshotDirectory {
flag: use_snapshot_archives_at_startup::cli::LONG_ARG.to_string(),
value: UseSnapshotArchivesAtStartup::Never.to_string(),
});
}
fastboot_snapshot
}
UseSnapshotArchivesAtStartup::WhenNewest => {
snapshot_utils::get_highest_loadable_bank_snapshot(snapshot_config)
.filter(|bank_snapshot| bank_snapshot.slot >= latest_snapshot_archive_slot)
}
};
if let Some(fastboot_snapshot) = fastboot_snapshot {
// fast boot code
} else {
// slow boot code
}
The get_highest_loadable_bank_snapshot
call is in multiple places, but we treat it differently because the behavior is different for those cases.
The error handling is more upfront if we cannot find it for Never
case, and don't have this (match, if, match) thing going on.
At least to me something like this is more clear, wdyt?
core/src/accounts_hash_verifier.rs
Outdated
@@ -303,6 +311,20 @@ impl AccountsHashVerifier { | |||
&accounts_hash_for_reserialize, | |||
bank_incremental_snapshot_persistence.as_ref(), | |||
); | |||
|
|||
// now write the fastboot file after reserializing so this bank snapshot is loadable | |||
let fastboot_slot = match accounts_package.package_kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename this? It's more like "fastboot_most_recent_fullsnapshot_slot" or "fastboot_base_slot".
Same with the "fastboot_file". It seems like it should be "fastboot_base_slot_file".
Current naming made me think it contained the slot we were fastbooting from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion; will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the fastboot file, do you want the function renamed, the comment reworded, or the file itself renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming fastboot_slot
has been fixed in 2e351f3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being more clear. I was hoping to rename the file and whole "concept" of this file.
We're using it for fastboot, sure, but the file itself contains the slot of the full snapshot this snapshot is based on. Seems like that's potentially useful info even without fastboot, so giving it a more clear name everywhere (file name, function names, variable names) would make it more obvious what information is available there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Yeah, that makes sense. The file could be "full_snapshot_archive_slot" instead. I was purposely going the opposite way; being ambiguous about what's inside the file, but indicating that it is used for fastboot. Renaming the file would be clear about the contents of the file, but ambiguous about usage.
I think for the sake of any forward compatibility/breaking changes, I'll go with the "full_snapshot_archive_slot" idea. This is unlikely to change. If we need more/different information later, we'll create another file.
Does that sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks @brooksprumo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Renamed in 796f5d1.
ledger/src/bank_forks_utils.rs
Outdated
let Some(bank_snapshot) = latest_bank_snapshot else { | ||
// If we do *not* have a local snapshot to fastboot, then it must be because there was | ||
// *not* a loadable local snapshot AND we shall *not* startup from a snapshot archive. | ||
assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jeff here, it seems odd to me. We are basically assigning a bool, then having an assert we're assigning it correctly?
Line 250:
let will_startup_from_snapshot_archives = match process_options.use_snapshot_archives_at_startup
{
UseSnapshotArchivesAtStartup::Always => true,
UseSnapshotArchivesAtStartup::Never => false,
UseSnapshotArchivesAtStartup::WhenNewest => latest_bank_snapshot
.as_ref()
.map(|bank_snapshot| latest_snapshot_archive_slot > bank_snapshot.slot)
.unwrap_or(true),
};
let bank = if will_startup_from_snapshot_archives {
// ...stuff
} else {
// fastboot from local state
let Some(bank_snapshot) = latest_bank_snapshot else {
// If we do *not* have a local snapshot to fastboot, then it must be because there was
// *not* a loadable local snapshot AND we shall *not* startup from a snapshot archive.
assert_eq!(
process_options.use_snapshot_archives_at_startup,
UseSnapshotArchivesAtStartup::Never,
);
return Err(BankForksUtilsError::NoBankSnapshotDirectory {
flag: use_snapshot_archives_at_startup::cli::LONG_ARG.to_string(),
value: UseSnapshotArchivesAtStartup::Never.to_string(),
});
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think the refactor made it a lot clearer, thanks! A couple more comments/questions
core/src/accounts_hash_verifier.rs
Outdated
@@ -303,6 +311,20 @@ impl AccountsHashVerifier { | |||
&accounts_hash_for_reserialize, | |||
bank_incremental_snapshot_persistence.as_ref(), | |||
); | |||
|
|||
// now write the fastboot file after reserializing so this bank snapshot is loadable | |||
let fastboot_slot = match accounts_package.package_kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being more clear. I was hoping to rename the file and whole "concept" of this file.
We're using it for fastboot, sure, but the file itself contains the slot of the full snapshot this snapshot is based on. Seems like that's potentially useful info even without fastboot, so giving it a more clear name everywhere (file name, function names, variable names) would make it more obvious what information is available there.
/// To be loadable, the bank snapshot must be a BankSnapshotKind::Post. | ||
/// And if we're generating snapshots (e.g. running a normal validator), then | ||
/// the full snapshot file's slot must match the highest full snapshot archive's. | ||
pub fn get_highest_loadable_bank_snapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a separate discussion with @brooksprumo about this function.
There is an edge-case that is not handled as efficiently as we'd like, but we decided to leave it as a follow-up since this PR covers the majority of cases and already has a lot going on.
The edge case is when we have a higher full snapshot archive than our highest snapshot. This can happen if ledger-tool is used to create a full snapshot archive - this does not create a snapshot.
This function could likely be improved so that we only check the existence of the full snapshot archive matching the full_snapshot_file_slot
, even if it not the latest full snapshot archive. However, it requires changes elsewhere in ABS (others?) to gate cleaning properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find! I'll handle this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with changes as they are, as mentioned in my comment, it covers most of the cases and we are leaving an edge-case as follow-up work.
Problem
Please refer to solana-labs#35367
Summary of Changes
Before loading a bank snapshot for fastboot, ensure it can be used! This means that if we do fastboot the bank snapshot, it has the correct other stuff on disk (aka the right full snapshot archive) so that subsequent snapshots will succeed, and we won't hit the error described in the problem.
Fixes solana-labs#35367