Hybrid archival mode #67
iFrostizz
started this conversation in
Brainstorming
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First notes
This ACP is a work-in-progress (WIP) and may be subject to change. Please consider it as non-final.
Abstract
This ACP proposes a new hybrid mode aimed at alleviating the resource requirements for running an archival node while maintaining a reasonable level of data accuracy and consistency.
Motivation
The cost of running an archival node is quite high, making it infeasible to run on commodity computers due primarily to demanding storage requirements.
As of early 2024, the database for an archival node is approximately 5TB, which can be expensive, even when configured as a RAID using smaller drives.
Bootstrapping an archival node is painfully time-intensive.
The initial fetching of the full state takes about one week, while its execution requires an estimated of 3 weeks,
with the estimated bootstrapping time increasing due to increasingly heavy blocks.
It is crucial to have people running archival nodes since they are responsible for maintaining the complete state of the blockchain.
Without them, "full nodes" only keep track and verify recent blocks, including accounts, logs, nonces, etc.
This limitation impacts applications built on top of this archival state, which often resort to using third-party services due to points 1 and 2.
Specification
Suggested changes
Add a new config flag when the node is not ran in archival state that would be called something like "
hybrid-mode
" or "light-archival-node
".This flag can only be true when the
pruning-enabled
istrue
.The flag may be enabled anytime and doesn't require the node to be bootstrapped from scratch again.
Bootstrapping the node with the pruning enabled without the hybrid mode and then activating the hybrid mode would simply make the hybrid mode start from scratch,
meaning that no historical data would be stored on disk.
Only store a part of the historical network state, and this is where some concerns are raised, which might be a challenge for this to be correctly implemented.
A good starting point would be to explore what the Portal network would solve. More on the portal network.
This hybrid mode would be very similar to the portal network, the difference being that the node would be less dependent of other nodes when it comes to historical data.
Optimizing further the data retrieval if it's cached on disk, and giving greater guarantees about the persistence of the data of the chain.
Attach a scarcity score to batches of data, which would be agreed on by all the network by the introduction of a new consensus message
to avoid sybil nodes lying about prioritizing the retrieval of some data.
Node maintainers can configure the maximum size of the historical data they are willing to store as well as the scarcity threshold that they want their hybrid node to operate on.
If the limit of the configured historical data is reached, then the batches of data with the lowest scarcity are deleted.
If the data is required by the node, instead of returning the "missing trie node" error, try to fetch the data while still keeping in mind
3.
.This behavior of either returning the error or fetching the data could also be configurable but may also be quite useless.
Concerns
Beta Was this translation helpful? Give feedback.
All reactions