Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0064: Transaction Receipts #64

Merged
merged 65 commits into from
Oct 20, 2023

Conversation

anoushk1234
Copy link
Member

Goal

Introduce the Receipt and Receipt Tree into the core protocol to allow for validating status of a confirmed transaction without trusting the RPC.

Notes

We've drafted this SIMD in collaboration with @ripatel-fd and have already discussed it with the Jump Firedancer team and @aeyakovenko. We would appreciate any feedback given to help move this SIMD forward.

@harsh4786
Copy link
Contributor

cc @samkim-crypto we would appreciate your review on this :)

@ripatel-fd
Copy link
Contributor

As requested, here are some thoughts on the implementation change:

The detailed design part of this document does not fully specify the commitment scheme introduced.
Ideally, Labs and Firedancer would be able to implement functionally-equivalent code for this SIMD just looking at the doc itself. It doesn't quite yet seem there. (This is probably partially due to not having an agreement on the design yet)

In its final form, the PR should minimally include:

  • Pseudocode for root-only construction function (Validator nodes will run this)
  • Pseudocode for inclusion proof generation (RPC nodes will run this)
  • Where and how often this code is expected to run

I would further be interested in the following design considerations. These are part of a larger opportunity to optimize and standardize uses of hash trees in the protocol.

  • Why not Blake3? (Already part of the protocol and should offer better performance for small blocks)
  • Why not a larger branch width? (Could benefit batched/vectorized implementations)
  • Performance overhead of sorting tree leaves
  • The current scheme claims to defend against second preimage attacks via length extension, even though the underlying construction is vulnerable. Is there a formal proof that it is impossible to generate two non-identical inclusion proofs?
  • You already use an 8-bit signing prefix, with the LSB specifying whether you have a leaf node or a branch node. Why not use the remaining 7 bits to fix the length extension vulnerability? e.g.
    • 0x00: leaf node
    • 0x01: branch node with 1 child
    • 0x02: branch node with 2 children

A further possibility for optimization is the hash function construction: SHA-256 has an internal block size of 64 bytes. By prefixing with a single byte, a branch node will have a 65 byte preimage, breaking message alignment across two blocks. It would be worth using a 64 byte prefix instead.
This way you have two cleanly aligned SHA blocks: The first block identifies the node type, the second block contains node data (up to 64 bytes). The standard SHA padding (which would go in the third block) can be committed, as the first block already protects against length extension.
Finally, the initial state should be permuted to prevent collision with standard SHA-256.

A fast implementation would cache the SHA state after applying the first block, skipping one block of SHA hashing every branch. As performance is currently mostly bound by SHA-256 and sorting, this should result in a noticeable speedup.
Similar optimizations are applicable to Blake3.

Instead of mentioning design properties in the security consideration section, I suggest structuring it into pairs of problem/solution. Here's a possible structure (Not complete, just for the sake of example)

  • threat model: forged transaction receipts
  • risk areas:
    • merkle tree malleability resulting in double spends
      • consideration: use merkle commitment scheme where each leaf only has one path and inclusion proof?
    • second pre-image attacks against merkle leaves
      • consideration: is our hash function strong enough?

When copying text, I suggest using quote blocks and including a link to the source.

Before merging, it would also be nice to fix grammar/spelling (e.g. lowercase "rust" => "Rust")

Finally, to reflect on design a bit: I would very much like to improve the status quo than repeat past mistakes (such as issues with the bank and PoH hash trees). However, the Firedancer is currently fairly busy with work on other parts on the runtime. Once we can dedicate some time to this, I intend to write up a specification and implementation including above performance improvements and security fixes.

@jacobcreech jacobcreech added standard SIMD in the Standard category core Standard SIMD with type Core labels Aug 16, 2023
@samkim-crypto
Copy link
Contributor

Terribly sorry for the delay in looking at this 🙏 .

In my opinion, the proposal makes a lot of sense overall 👍 . @ripatel-fd has some really nice suggestions, so I think some of his points can be incorporated into the proposal.

A popular way to prevent a length extension attack in this type of scenario where the length of the leaves is not fixed (and perhaps you want to leave signatures untouched) is to hash the root hash one more time with the length of the leaves.

          H*
         /   \
        /     \
       Hγ.  N(receipts)
      /  \
     /    \
   Hα      Hβ
  / |     / \
 /  |    /   \
L0  L1   L2  L3

where N(receipts) is the total number of leaves. Here, is the previous root of the tree and H* is an extra layer on top that would act as the final root. The inclusion proof would then be the standard Merkle tree proof including N(receipts). This fixes the number of leaf elements to be interpretted by the verifier of the proof, so we don't have to worry about length any more.

Some other comments:

  • In terms of the branch width, increasing it would increase proof size while batching is a little hard to take advantage of if there is no guarantee in the order of the leaves, so I think two is natural here (I am open to other suggestions!).
  • If you want help with a formal proof here, then I can help out with that if you guys like (however you guys take advantage of length extension). The proof should be a pretty standard reduction from a CRHF.
  • I think the question of whether to use Blake3 or take advantage of internal block sizes for SHA-256 is going deep into the micro-optimization territory, but it is definitely good to fully consider and optimize everything 👍 .

@ripatel-fd
Copy link
Contributor

ripatel-fd commented Aug 29, 2023

For now, my preference would be to stay with SHA256, and only change hash functions if the change is made everywhere in the protocol.
Hardware optimization (SHA-NI, FPGA) exists for SHA256, but not for Blake3.

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last one and we're good by my eye

proposals/0064-transaction-receipt.md Outdated Show resolved Hide resolved
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@t-nelson
Copy link
Contributor

@jacobcreech ball's in your court ⛹️‍♂️

Copy link
Contributor

@ripatel-fd ripatel-fd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes due to issue in the hash tree pseudocode.

Copy link
Contributor

@ripatel-fd ripatel-fd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on behalf of Firedancer

Copy link
Contributor

@jacobcreech jacobcreech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work @anoushk1234 @harsh4786 @ripatel-fd @t-nelson ! Good to see this come to consensus

@jacobcreech jacobcreech merged commit f34a090 into solana-foundation:main Oct 20, 2023
2 checks passed
buffalojoec pushed a commit to buffalojoec/solana-improvement-documents that referenced this pull request Dec 15, 2023
* create transaction receipt simd

* update

* Update 0063-transaction-receipt.md

* Update 0063-transaction-receipt.md

* fix name

* tree spec wip

* receipt tree spec wip

* fixup

* remove logs from receipt

* update

* update

* update

* fix lint

* fix bankhash

* Update 0064-transaction-receipt.md

* Update 0064-transaction-receipt.md

* Update 0064-transaction-receipt.md

* update tree spec

* change receipt structure to use Message hash instead of a signature.

* bench: add benchmarks on receipt tree with message hashes instead of signatures

Removed signatures and added message hashes for our benchmarks.

* minor fixes

* minor fixes

* Update 0064-transaction-receipt.md

* fix: add len of receipts to tree

* fix: add byte ordering for length suffix

* fix: lint

* change Receipt to TransactionReceipt

Co-authored-by: Trent Nelson <[email protected]>

* change slot to block

Co-authored-by: Trent Nelson <[email protected]>

* optimisations and clean up

* remove redundant comment

Co-authored-by: ripatel-fd <[email protected]>

* remove redundant comment for version

Co-authored-by: ripatel-fd <[email protected]>

* add root for empty set.

Co-authored-by: Richard Patel <[email protected]>

* fix typo

Co-authored-by: Richard Patel <[email protected]>

* append justification for sha256

Co-authored-by: Richard Patel <[email protected]>

* change terminology for receipts

Co-authored-by: Richard Patel <[email protected]>

* fix receipt terminology for tree

Co-authored-by: Richard Patel <[email protected]>
Co-authored-by: Trent Nelson <[email protected]>

* Minor clean up

* fix

* Update 0064-transaction-receipt.md

* grammar

* clarify

* typo

* clarify

* company name

* precision

* add layout and fix lint

Co-authored-by: Richard Patel <[email protected]>

* fix typo in transaction

Co-authored-by: lheeger-jump <[email protected]>

* make layout section more explicit

Co-authored-by: Richard Patel <[email protected]>

* nit: mention theoretical perf in hash function choice

Co-authored-by: ripatel-fd <[email protected]>

* change should to must - rf2119

Co-authored-by: Trent Nelson <[email protected]>

* update should to must - rfc2119

Co-authored-by: Trent Nelson <[email protected]>

* replace "fixed" with "avoided" to be more clear

Co-authored-by: Trent Nelson <[email protected]>

* More details in terminology section

Co-authored-by: Trent Nelson <[email protected]>

* fix lint and add missing node in tree spec

Co-authored-by: Trent Nelson <[email protected]>

* Clarify impact of receipts

Co-authored-by: Trent Nelson <[email protected]>

* fix lint

* add intermediate_node when leaf count is zero

Co-authored-by: Trent Nelson <[email protected]>

* Author list

* Fix hash notation

Co-authored-by: Trent Nelson <[email protected]>

* add empty intermediate root for empty vector illustration

* Empty tree

---------

Co-authored-by: harsh4786 <[email protected]>
Co-authored-by: Trent Nelson <[email protected]>
Co-authored-by: ripatel-fd <[email protected]>
Co-authored-by: Richard Patel <[email protected]>
Co-authored-by: lheeger-jump <[email protected]>
Co-authored-by: Trent Nelson <[email protected]>
@0xSol
Copy link

0xSol commented Mar 14, 2024

As Solana moving towards accounts hashing, will light clients be supported per this merkle trees based design? If not, what are the alternatives?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Standard SIMD with type Core standard SIMD in the Standard category
Projects
Status: SIMDs
Development

Successfully merging this pull request may close these issues.

9 participants