Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved light client requests networking protocol #9

Merged
merged 5 commits into from
Dec 10, 2024

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Jul 19, 2023

Rendered

This is a continuation of w3f/PPPs#10

Compared to w3f/PPPs#10, I've changed my mind about truncated proofs. I think that it's better if the client can fully verify the proofs that are returned even when the response size limit would be exceeded. This also means that the proof no longer needs to be deterministic, and I've removed the paragraph about this.

cc @cheme

@tomaka
Copy link
Contributor Author

tomaka commented Aug 22, 2023

@cheme Do you have any opinion? I think that you and I are pretty much the only people who have opinions/interest on this topic.

Implementers of the replier side should be careful to detect early on when a reply would exceed the maximum reply size, rather than inconditionally generate a reply, as this could take a very large amount of CPU, disk I/O, and memory. Existing implementations might currently be accidentally protected from such an attack thanks to the fact that requests have a maximum size, and thus that the list of keys in the query was bounded. After this proposal, this accidental protection would no longer exist.

Malicious server nodes might truncate Merkle proofs even when they don't strictly need to, and it is not possible for the client to (easily) detect this situation. However, malicious server nodes can already do undesirable things such as throttle down their upload bandwidth or simply not respond. There is no need to handle unnecessarily truncated Merkle proofs any differently than a server simply not answering the request.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with malicious remark.
One thing that client can see as malicious is restarting a truncated proof at the wrong node level (like proof is node a-b-c first reply is a-b if second reply is b-c this is malicious).


This protocol keeps the same maximum response size limit as currently exists (16 MiB). It is not possible for the querier to know in advance whether its query will lead to a reply that exceeds the maximum size. If the reply is too large, the replier should send back only a limited number (but at least one) of requested items in the proof. The querier should then send additional requests for the rest of the items. A response containing none of the requested items is invalid.

The server is allowed to silently discard some keys of the request if it judges that the number of requested keys is too high. This is in line with the fact that the server might truncate the response.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my implementation I do not allow restarting a proof with a wrong trie node eg we query k1 k2 and k3, proof to all is nodes a-b-c-d-e proof to k1-k3 is only abde then if first reply is ab we got k1 but if second is de we don't have k2 and fail.
So only second valid reply is one starting with node c and any usefull following nodes.
So not really sure if the intention was that to discard allow discarding key in the middle of a query, but I think it will be simpler to word "allowed to discard all following keys of the request and terminate reply".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this makes sense. Basically the server is allowed to drop keys, but we don't allow to drop random keys. So, that keysAfter will not later include keys that already got send.

For the purpose of this networking protocol, it should be considered as if the main trie contained an entry for each default child trie whose key is `concat(":child_storage:default:", child_trie_hash)` and whose value is equal to the trie root hash of that default child trie. This behavior is consistent with what the host functions observe when querying the storage. This behavior is present in the existing networking protocol, in other words this proposal doesn't change anything to the situation, but it is worth mentioning.
Also note that child tries aren't considered as descendants of the main trie when it comes to the `includeDescendants` flag. In other words, if the request concerns the main trie, no content coming from child tries is ever sent back.

This protocol keeps the same maximum response size limit as currently exists (16 MiB). It is not possible for the querier to know in advance whether its query will lead to a reply that exceeds the maximum size. If the reply is too large, the replier should send back only a limited number (but at least one) of requested items in the proof. The querier should then send additional requests for the rest of the items. A response containing none of the requested items is invalid.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, in my implementation the replier is allowed to return a single trie node even if it does not add a reply to any key (eg a big branch), as long as it make the query reply progress.
(I can implement the constraint at substrate level though (at trie level it makes sense to me to allow chunk with any non empty trie content that participate in query progress).

@shawntabrizi
Copy link
Member

This RFC was brought up at the current Parity retreat as an important unblocker for teams like PAPI and Chopsticks, who seem to be pushing the boundaries of light client support today.

@cheme are any of your comments blockers in your head, or just comments?

@tomaka are you okay with the current status, and should we put it to a vote?

Seems like if there is further iteration needed, we could benefit from merging this one now, and doing improvements in a follow up RFC, assuming nothing is too different than what is already here.

cc @bkchr @ggwpez

@tomaka
Copy link
Contributor Author

tomaka commented Dec 5, 2024

The RFC is good as is and would be an improvement over the current protocol.

The remaining open questions are about how to enforce a maximum response size, however the existing protocol also suffers from this maximum response size problem.
In other words, this RFC does not open any new problem but also doesn't solve all the problems with existing protocol. It would have been nice to solve all the problems, but that was unresolved.

@shawntabrizi
Copy link
Member

shawntabrizi commented Dec 5, 2024

@tomaka would you then do the honors of /rfc propose? :)

Let's take it to a vote!

@tomaka
Copy link
Contributor Author

tomaka commented Dec 5, 2024

/rfc propose

@paritytech-rfc-bot
Copy link
Contributor

Hey @tomaka, here is a link you can use to create the referendum aiming to approve this RFC number 0009.

Instructions
  1. Open the link.

  2. Switch to the Submission tab.

  1. Adjust the transaction if needed (for example, the proposal Origin).

  2. Submit the Transaction


It is based on commit hash 228d53b9fbbb829852802fdb06af76524bc269dc.

The proposed remark text is: RFC_APPROVE(0009,96353a0a10c3634c6757315630711354e5fdaf1159823a888290c0116952a480).

Copy link

github-actions bot commented Dec 5, 2024

Voting for this referenda is ongoing.

Vote for it here

@anaelleltd anaelleltd added Reviewed Is ready for on-chain voting. and removed Stale Is no longer pursued. labels Dec 5, 2024
@cheme
Copy link

cheme commented Dec 9, 2024

This RFC was brought up at the current Parity retreat as an important unblocker for teams like PAPI and Chopsticks, who seem to be pushing the boundaries of light client support today.

@cheme are any of your comments blockers in your head, or just comments?

@tomaka are you okay with the current status, and should we put it to a vote?

Seems like if there is further iteration needed, we could benefit from merging this one now, and doing improvements in a follow up RFC, assuming nothing is too different than what is already here.

cc @bkchr @ggwpez

just comments/idea, no blockers.

Copy link
Contributor

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good. Left some question. Would be nice to get some answer before I'm voting on chain/may require a new proposal if accepted.


If `onlyKeysAfterIgnoreLastNibble` is missing, it is equivalent to `false`. If `onlyKeysAfterIgnoreLastNibble` is `true` and `onlyKeysAfter` is missing or empty, then the request is invalid.

For the purpose of this networking protocol, it should be considered as if the main trie contained an entry for each default child trie whose key is `concat(":child_storage:default:", child_trie_hash)` and whose value is equal to the trie root hash of that default child trie. This behavior is consistent with what the host functions observe when querying the storage. This behavior is present in the existing networking protocol, in other words this proposal doesn't change anything to the situation, but it is worth mentioning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concat(":child_storage:default:", child_trie_hash) this is wrong? The storage key in the main trie should be

concat(":child_storage:default:", child_storage_key)

And the value associated to this key is the hash of the child trie. Or I just don't get what you want to say with this sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just a typo. The intent is that this mimics exactly what the runtime sees.

If `onlyKeysAfterIgnoreLastNibble` is missing, it is equivalent to `false`. If `onlyKeysAfterIgnoreLastNibble` is `true` and `onlyKeysAfter` is missing or empty, then the request is invalid.

For the purpose of this networking protocol, it should be considered as if the main trie contained an entry for each default child trie whose key is `concat(":child_storage:default:", child_trie_hash)` and whose value is equal to the trie root hash of that default child trie. This behavior is consistent with what the host functions observe when querying the storage. This behavior is present in the existing networking protocol, in other words this proposal doesn't change anything to the situation, but it is worth mentioning.
Also note that child tries aren't considered as descendants of the main trie when it comes to the `includeDescendants` flag. In other words, if the request concerns the main trie, no content coming from child tries is ever sent back.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the roots of the child tries are included in the response, as they are part of the main trie?

Copy link
Contributor Author

@tomaka tomaka Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, concat(":child_storage:default:", child_storage_key) is part of the main trie.

What this small text clarifies is that the content of the child trie isn't considered as a descendant of concat(":child_storage:default:", child_storage_key).
Some people might otherwise interpret that includeDescendants includes the content of child tries.


This protocol keeps the same maximum response size limit as currently exists (16 MiB). It is not possible for the querier to know in advance whether its query will lead to a reply that exceeds the maximum size. If the reply is too large, the replier should send back only a limited number (but at least one) of requested items in the proof. The querier should then send additional requests for the rest of the items. A response containing none of the requested items is invalid.

The server is allowed to silently discard some keys of the request if it judges that the number of requested keys is too high. This is in line with the fact that the server might truncate the response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this makes sense. Basically the server is allowed to drop keys, but we don't allow to drop random keys. So, that keysAfter will not later include keys that already got send.


This proposal doesn't handle one specific situation: what if a proof containing a single specific item would exceed the response size limit? For example, if the response size limit was 1 MiB, querying the runtime code (which is typically 1.0 to 1.5 MiB) would be impossible as it's impossible to generate a proof less than 1 MiB. The response size limit is currently 16 MiB, meaning that no single storage item must exceed 16 MiB.

Unfortunately, because it's impossible to verify a Merkle proof before having received it entirely, parsing the proof in a streaming way is also not possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you order the proof properly, you can also stream it to the receiver. They can check the proof on the fly. This could also work for bigger items, but yeah there would be no way to knowing how big an item is before streaming it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember what I had in mind when I wrote this, but yes I think the problem is about making sure that the sender isn't sending random garbage data. If the sender says that a storage item is 2 GiB, the receiver will have to receive the full 2 GiB before being able to detect whether it was legitimate or not.

Copy link
Contributor Author

@tomaka tomaka Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the receiver will have to receive the full 2 GiB before being able to detect whether it was legitimate or not

That's not 100% true. In theory, the light client could simply pass through the storage item through the JSON-RPC API to the code that decodes that storage item, and that code could detect early on whether the data makes sense (and maybe also store the decoded data in a more space-optimal way).
And if turns out that the hash of the storage item doesn't match what is expected, the JSON-RPC server could tell to the JSON-RPC client "wait, nevermind, everything I sent is wrong".

But in practice that solution would be extremely complicated.

Copy link

PR can be merged.

Write the following command to trigger the bot

/rfc process 0xcdad9ffe7f78f870f6188871d3d2bb338d1c1d081e6f0c1fc3f8075f4aa356a2

@bkchr
Copy link
Contributor

bkchr commented Dec 10, 2024

/rfc process 0xcdad9ffe7f78f870f6188871d3d2bb338d1c1d081e6f0c1fc3f8075f4aa356a2

@paritytech-rfc-bot paritytech-rfc-bot bot merged commit 73c9853 into polkadot-fellows:main Dec 10, 2024
@paritytech-rfc-bot
Copy link
Contributor

The on-chain referendum has approved the RFC.

@tomaka tomaka deleted the rfc-9 branch December 11, 2024 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reviewed Is ready for on-chain voting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants