Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archival bucketlist #4403

Merged
merged 5 commits into from
Nov 23, 2024
Merged

Conversation

SirTyson
Copy link
Contributor

@SirTyson SirTyson commented Jul 30, 2024

Description

Resolves #4394

Depends on XDR changes in stellar/stellar-xdr#200.

This change refactors the BucketList so that we can have multiple Bucket Lists simultaneously. This is necessary for State Archival, where validators will maintain a live BucketList, hot archive BucketList, and cold archive BucketList. This change does not yet support writing the new archival BucketLists to history,

Each BucketList has small differences wrt to the entry types it stores, merge logic, and how lookups are done, but the overall BucketList level logic is the same. Because of this, it seemed easiest to template the Bucket related classes and specialize a few functions. The difference are as follows (I'll be updating the Archival CAP with this info soon)

Entry Types:
Live BucketList: BucketEntry
Hot Archive BucketList: HotArchiveBucketEntry.

This change was necessary due to how LedgerKeys are stored. In the Live BucketList, only DEADENTRY store plain LedgerKey. DEADENTRY is equivalent to null, so these LedgerKeys are dropped at the bottom level bucket.

Hot Archives require two types that store LedgerKeys: HA_LIVE and HA_DELETED. HA_LIVE is equivalent to the "null" type in the Live BucketList, and is dropped in the bottom bucket. However, HA_DELETED needs to be persisted.

Merging:
In the hot archive, LedgerEntry are never updated. They may be overwritten by another HotArchiveBucketEntry type, but the LedgerEntry contents itself never change. This means that the INITENTRY, LIVEENTRY, and DEADENTRY abstraction doesn't really make sense. This makes the merge logic for Hot Archive buckets very simple, the newer entry always overwrites the older entry.

This PR adds the concept of a "Tombstone" entry. A tombstone entry essentially represents null in the given BucketList and can be dropped once it reaches the bottom bucket. On the live BucketList, DEADENTRY is the tombstone type. On the Hot Archive BucketList, HA_LIVE is the tombstone.

BucketListDB lookup:
The live BucketList only ever returns LedgerEntry types when queried. Any DEADENTRY LedgerKey is simply omitted from the return, as this is the "null" tombstone type.

Hot archive lookups must return both LedgerKey and LedgerEntry types. HA_LIVE entries are of type LedgerKey and can be omitted from the return since they are the tombstone type. However, HA_ARCHIVED entries are not tombstone entries and must be returned when found. Do to this, Hot Archive lookups return HotArchiveBucketEntry instead of LedgerEntry when queried.

Future updates will add more functionality to Hot Archives, further distinguishing it from the live BucketList, and add a third BucketList type called the cold archive. This cold archive will add a third BucketEntry type and have different merge logic as well, so I think the templated Bucket classes, while verbose, are warranted.

This is currently a draft until the XDR changes are merged.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@SirTyson SirTyson marked this pull request as ready for review August 13, 2024 04:46
Cargo.toml Outdated Show resolved Hide resolved
src/main/main.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Show resolved Hide resolved
src/bucket/BucketSnapshot.cpp Show resolved Hide resolved
src/bucket/BucketSnapshot.h Outdated Show resolved Hide resolved
src/bucket/LedgerCmp.h Outdated Show resolved Hide resolved
src/util/ProtocolVersion.h Outdated Show resolved Hide resolved
@marta-lokhova
Copy link
Contributor

Note: changes in this PR impact BucketListDB quite a bit, I think we need to resolve #4433 to make sure we have good coverage. Wdyt?

@SirTyson SirTyson force-pushed the archival-bucketlist branch 6 times, most recently from 6ca5887 to 344647f Compare September 19, 2024 17:45
Copy link
Contributor

@dmkozh dmkozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the surface level. The code looks fine for now. It would be nicer if this PR was smaller, but I'm not sure if there is a way to stage this change. Rebasing should at least help getting rid of meta diffs.

src/bucket/Bucket.h Outdated Show resolved Hide resolved
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for slow review. Mostly all makes sense and looks .. well, a little gnarly, but good! Especially in light of recent chat.

Requested changes fall into 3 groups:

  1. A handful of "definite" changes (like I think one or two little typo-sized bugs)
  2. A handful of "maybe you could refactor/dedupe this new duplication" cases
  3. An even-more-general / open-ended "is this really the best way to do the type parameterization" questions.

I'd like to at least see cases 1 fixed and 2 attempted (or explained why not to, if you have already tried and the result is worse). Case 3 is more a latent request for discussion / exploration of the alternative parameterization patterns, like maybe it all relates to the impl pattern or something? I'm curious but won't block on the issue either way. I can also try some sketching alternative parameterization patterns on my own.

I guess I would like, if nothing else, for all the function bodies that contain a branch on "if-constexpr type-A-or-B" to also contain the "static-assert-type-is-either-A-or-B" call (and for it to be factored out into a named helper). It seems to be true in many cases, but I'd like it to at least be true in all cases where the if-constexpr thing is employed. Just like so we can get a static list of places to fix when A-or-B becomes A-or-B-or-C in the future.

src/bucket/Bucket.cpp Outdated Show resolved Hide resolved
src/bucket/Bucket.cpp Outdated Show resolved Hide resolved
src/bucket/BucketInputIterator.h Outdated Show resolved Hide resolved
src/bucket/BucketInputIterator.cpp Outdated Show resolved Hide resolved
src/catchup/IndexBucketsWork.cpp Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Outdated Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Outdated Show resolved Hide resolved
src/bucket/BucketListSnapshot.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/test/BucketListTests.cpp Show resolved Hide resolved
@SirTyson
Copy link
Contributor Author

SirTyson commented Nov 5, 2024

I should have addressed all the comments. I was able to refactor out almost all the constexpr branching, but had to use an ugly templated static interface for some BucketManager functions due to the impl construct. While it's not particularly pretty, this is probably the least worst option available given template constraints on virtual functions.

Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpointing here, as I'm still reviewing the change. Overall, really cool to see us getting closer to state archival! Thanks for putting it together. I left a few comments, mostly echoing @graydon's concerns about potential maintenance burden, and code readability. I think it's important we agree on the interface before we land this change.

src/bucket/BucketOutputIterator.cpp Show resolved Hide resolved
src/bucket/BucketOutputIterator.cpp Show resolved Hide resolved
src/main/Config.cpp Show resolved Hide resolved
src/bucket/BucketList.h Outdated Show resolved Hide resolved
src/bucket/BucketList.h Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Outdated Show resolved Hide resolved
src/bucket/FutureBucket.cpp Outdated Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Outdated Show resolved Hide resolved
src/bucket/BucketManagerImpl.cpp Outdated Show resolved Hide resolved
@SirTyson SirTyson force-pushed the archival-bucketlist branch 2 times, most recently from 36fc351 to 3f365af Compare November 11, 2024 09:19
@SirTyson SirTyson mentioned this pull request Nov 14, 2024
6 tasks
@SirTyson SirTyson mentioned this pull request Nov 20, 2024
6 tasks
@marta-lokhova marta-lokhova added this pull request to the merge queue Nov 23, 2024
Merged via the queue into stellar:master with commit e2ec789 Nov 23, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor BucketList to allow multiple BucketList
4 participants