rskip | title | description | status | purpose | author | layer | complexity | created |
---|---|---|---|---|---|---|---|---|
194 |
Bloom filter compression |
Draft |
Sca |
SDL (@sergiodemianlerner) |
Core |
2 |
2020-11 |
RSKIP | 194 |
---|---|
Title | Bloom filter compression |
Created | NOV-2020 |
Author | SDL |
Purpose | Sca |
Layer | Core |
Complexity | 1 |
Status | Draft |
discussions-to | https://research.rsk.dev/t/rskip-194-bloom-filter-compression/73 |
In this RSKIP we propose that the bloom filter data is replaced by the hash of the same data when computing the block hash. By doing so, systems that verify only cumulative proof of work can download the block headers without the bloom filter data.
The RSK header has a field called logsBloom that contains 256 bytes that represent a bloom filter updated with the contracts and accounts used in the block. This field is seldom used because as more and more transactions are processed in a block, the content becomes more prone to false positives. There are more efficient techniques to obtain the events generated by a contract, such as using linked-events, proposed the TXINDEX opcode was introduced in RSK, and re-introduced in RSKIP192.
The size of the bloom filter is problematic in two contexts:
- Light clients must download the data of the bloom filter even if they do not require it.
- The PowHSM must receive and skip all information regarding the bloom filter for every header processed, which includes blocks in the main chain and block headers of referenced uncles.
This RSKP solves both problems. PowHSMs can receive a compressed encoding of the block header. Light clients can avoid downloading the bloom filter (using a new network command) or download it and later discarding it.
Another motivation for this RSKIP is that the RSK block lacks a simple versioning system. Every new change requires using the number of fields to detect which type of block it is. As the number of modifications to the block header performed in hard forks increases, the changes can be mixed in different ways to create a high and unnecessary number of different types of blocks. This increases the complexity of validation code and makes unit testing really hard to cover all cases.
This RSKIP adds to the block header a blockVersion
field that allows clients to take decisions based on the blockVersion instead of trying to infer it.
To detect this new field, the size of the bloom filter field is used, but hopefully after this change no other attributes of the fields will be used to detect the block version.
The RSK block header is a list of the following elements:
parentHash, unclesHash, coinbase,stateRoot, txTrieRoot, receiptTrieRoot, logsBloom, difficulty, number, gasLimit, gasUsed, timestamp, extraData, paidFees, minimumGasPrice, uncleCount, ummRoot [, mergeMiningFields ]
The mergeMiningFields are the following:
bitcoinMergedMiningHeader,
bitcoinMergedMiningMerkleProof,
bitcoinMergedMiningCoinbaseTransaction
All blocks prior the hard-fork block number are considered version 0. Afterward, new blocks must be version 1. For block headers version 1, there are two separate serialization methods, one for the transfer of the block header information and the other for the computation of the block hash. When serializing the block header version 1 to obtain the block hash, the field logsBloom is replaced by an RLP list of:
- The blockVersion field (integer) (1 byte)
- The Keccak hash of the bloom filter data (32 bytes).
When serializing the block header for transmission or storage, the logsBloom field is:
- The blockVersion field (integer) (1 byte)
- The bloom filter data (256 bytes).
Consensus must validate that the blockVersion is exactly 1 for blocks after the hard fork, until this number is incremented by another consensus change.
Generally the blockVersion is stored as the first field of an object. While this is entirely possible for RSK block headers, PowHSMs cannot understand any other block header format. However, PowHSM can verify without problem a block header with shorter data stored in the logsBloom field. To avoid forcing a costly firmware upgrade for all PowHSMs, the blockVersion is stored in the logsBloom filter, which is unchecked by the hardware.
While computing the hash without the full bloom filter data may incentivize nodes to remove this data permanently, the overhead of the bloom filter data is very low for blocks with many transactions. Therefore this incentive is weak.
An improved version of this RSKIP would also hash the fields coinbase, stateRoot, txTrieRoot, receiptTrieRoot, gasLimit, gasUsed, extraData, paidFees, minimumGasPrice, uncleCount and ummRoot into the same inner hash, saving another 145 bytes. This change converts the RSK header into a mini-header, resembling the old mini-blockchain proposal (updated here). However this change requires the upgrade of the PowHSM firmware, and therefore it should be considered if there is a PowHSM firmware upgrade agreed by the community. This technique can be later combined by coinprune-like method to reduce the blockchain size even more.
This change is a hard-fork and therefore all full nodes must be updated.
Block explorers do not need to be updated, as the RPC interface will return the serialization block header that corresponds with the expanded version of the block header.
SPV light-clients won't be able to take advantage of all the benefits of this change until they can request from peers the compressed format of block headers. PowHSM do not need to be upgraded to support this change.
TBD
TBD
No new security issue was identified related to this RSKIP.
Copyright and related rights waived via CC0.,