Pathfinder state management #713
Closed
Mirko-von-Leipzig
started this conversation in
Ideas
Replies: 1 comment
-
Or maybe having some central controller / business logic which drives a bunch of async state machines. This central controller is also responsible for driving the database writes. This means it can launch concurrent state machines, but keep potentially conflict inducing state machines to only run by themselves. e.g. drive block processing and class hash processing to completion. And run any data deletion process to completion only once there are no other tasks remaining. And once that completes, start other new tasks again. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
State & Sync management in Pathfinder
Thoughts on a general architecture for state and sync management in Pathfinder.
Motivation
Our current "architecture" consists of two separate processes, each driving the acquisition of data from a different source - L1 and L2. This has worked well in the sense that its easy to understand, and since each process is purely sequential, there is little risk of state corruption. It has some downsides though; since everything is sequential (and one-block-at-a-time) it becomes impossible to do anything creative ito sync or live repair etc. It's quite inflexible - any experimentation requires both extensive understanding of the pipeline, as well as making sure the implicit invariants are upheld. For example, it would be a pita (or impossible) to skip some blocks at the start of a sync, with the intent of backfilling the gaps later.
Problematic features
Here are some cool things we probably want to tackle; which will be difficult to implement.
Gossipsub
New blocks now arrive via p2p. It is also entirely possible (and likely) that our node will miss a new block, and that there will be gap that requires repairing. It would of course be nice if we could verify (insofar as possible) and store this data even though we don't have the preceding blocks. This would be possible to do, now that we have this concept of canonical blocks, however this requires extra thinking and some changes again.
Variants
This is one of the bigger motivations I have for more flexibility. Currently our node is an archive node and cannot be configured in any other way. This has made sense, and will continue to be a requirement for p2p.
However, most direct users of pathfinder don't really want or need this. What they care about is mostly the latest data; and in some cases only a very tiny fraction of the state. In other ecosystems this leads to completely separate node implementations; one for archive-node, one for full-node, one for light-node, one for beacon-node etc. This makes sense for networks in which nodes have both state and participate in consensus - it would be dangerous to have state management cause corruption and then impact consensus. And just purely for historical reasons, things aren't very configurable nor flexible e.g. Ethereum.
Its probably obvious that we can make node variants which are either archive (current model), or a full-node which stores only the latest
N
blocks. However even a full-node will probably be too heavy for example a mobile device.Mobile wallet example
Imagine the use case of a mobile wallet app.
This architecture is designed around how pathfinder is today. And it makes sense in that paradigme.
It would be great if this wallet simply ran its own pathfinder instance. This is of course impossible with an archive-node. Imagine however, a node that only stores the data it actually cares about. This app cares about L1 security, but it only cares about very specific contracts -- namely the user's account contract and some other wallet specific API contracts. Instead of downloading applying full state diffs, it requests merkle proofs for the state it is interested in. It may or may not be interested in class definitions (and possibly only a specific subset of classes). Similarly for transactions - only those interacting with specific contracts. Block headers are maybe only verified but not stores i.e. only store hash + verified by L1.
I don't actually know if this would solve their mobile wallet problem; but the point is more that if we are flexible, we can probably build something that suits the needs of many apps. Without creating security issues or risking state corruption.
Sync
There are many ways (with p2p) in which we can achieve sync. Current way could be sped up by downloading many blocks at once and instead of ingesting them one at a time, handle them in parallel. And then have a separate system which does the canonical checks. We could have block headers be verified indepedently from state diffs (i.e. concurrently).
We could have a sync which immedietely jumps to block
N
, gets the state at blockN
via p2p and verifies it against L1 at that block. We request only the block chain hashes fromgenesis -> N
(as a work-around for L1 not having the block hash), and immedietely start serving the latest data on the chain. We could then have a separate process to backfillgenesis:N-1
if we are an archive node. If we are not an archive node, then this backfilling isn't required.We could serve the raw block and state diff data and simply apply it locally (we've discussed this often). There are many such ideas and combinations one could dream up, and try out. And it would be great if these could simply be added and experimented with, without requiring an architecture adjustment each time.
Indexers
A lot of the major projects running pathfinder, essentially have custom indexers written around pathfinder. It would make sense to support this use case more directly. Some ways of achieving this:
What do we want? (from this architecture)
As a recap of (maybe not so obvious) things that are going on:
N
, and ends once all historic data is backfilled.Proposed architecture
Essentially an event driven architecture with many, very specific single purpose subsystems.
Events are distributed using a single broadcast event bus i.e. every subsystem receives every event and can react accordingly. The alternative is having specific streams for certain events, but I don't think this buys us anything, and makes connecting subsystems together difficult to manage and configure. If we find that we get specific clusters of subsystems (i.e. a bigger system) then we can of course isolate those and treat them as a subsystem in aggregate again.
Let us consider receiving a new block header event, and what it's impact could be, with some example subsystems.
N
received via gossipsub.NewLatestHeader(Box<Header>)
event is submitted to the global bus.NewClass<Box<Class>>
eventsNewTx<(Box<Transaction>, Box<Receipt>)>
eventsN
connects to current tiptip+1
NewHeader(Box<Header[tip+1]>)
(this would be repeated until it connects)N
NewDiff<Box<StateDiff>>
N
if rootN-1
exists and matchesI imagine all subsytems will operate more or less in this fashion:
For example, the canonical block tracker might cache the current canonical block tip on startup, as well as the latest L1 root (and block hash in future). It checks
NewHeader
events to see if it can extend the canonical tip. If yes, it also checks any data requirements for that block. It also tracks new L1 updates and reorgs to determine that the existing canonical chain is intact.Problems / difficulties
Maintaining data integrity remains a non-trivial task. The issue is still that database write actions and reads are difficult to co-ordinate. If some subsystem wants to drive some data deletion it might impact the invariants that other subsystems want to uphold. e.g. some block is deleted just after another subsystem had checked for its existence. Essentially, we are missing Rust's guarantees within the database itself..
Subsystems themselves might still be difficult to maintain even though they have a single purpose. Our current design of never (rarely) deleting data, and block-by-block means that each part has some invariants they can rely on. If we remove such guarantees then it can make reasoning about things difficult..?
Database writing must still take place in a single thread somehow. So this implies that there should be
DatabaseWriteEventXXX
for every thing as well? Maybe an alternative is that onlyDatabaseWriteEventXXX
get put on the bus? And that subsystems should only react to things that have already been written? Or some combination thereof..Alternatives
Instead of many systems, just create the functionality for all these different subsystems, and drive them from a central logical center. This center then defines all the interactions. So if you want a new node variant, you need to write this new business logic, but can hopefully re-use many existing functions..
Beta Was this translation helpful? Give feedback.
All reactions