-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NMT Review #94
NMT Review #94
Conversation
// TODO [Me] Shouldn't we specify that the first 8 bytes represent the namespace.ID | ||
type PrefixedData []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the issue with this is that the library wants to support varying namespace sizes (not only 8 bytes necessarily).
We could instead:
- just not use a type alias for byte and simply accept that this is even easier to misuse (would have the nice side effect that the user would not need to cast between bytes and PrefixedData at all)
- use a dedicated type; e.g. one of the two options here: namespace: PrefixedData/PrefixedData8 are easy to misuse #71
@@ -115,6 +119,7 @@ func New(h hash.Hash, setters ...Option) *NamespacedMerkleTree { | |||
leaves: make([][]byte, 0, opts.InitialCapacity), | |||
leafHashes: make([][]byte, 0, opts.InitialCapacity), | |||
namespaceRanges: make(map[string]leafRange), | |||
// TODO [Me] Shouldn't minNID be populated by `0x00`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, this would not work:
Line 400 in b04eea5
func (n *NamespacedMerkleTree) updateMinMaxID(id namespace.ID) { |
That said, I think we can get rid of these two fields entirely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then, does it mean that the minID
is actually equal to 0xFF
which corresponds to the Max namespace ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I think we can get rid of these two fields entirely!
Noted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I just double checked the code, and figured these two fields represent the range of the namespace IDs of the data stored by a neamspace tree, and they are accessed in here:
Line 174 in ac87f1c
if nID.Less(n.minNID) || n.maxNID.Less(nID) { // TODO [Me] we could move this entire if block inside the `foundInRange` function |
This means, as it is, we cannot remove these two fields.
However, we could retrieve the same information i.e., the namespaceID range by looking at the tree
rawRoot
field (as it is supposed to embody the max and min namespace IDs of the entire tree). However, that requires a bit of refactoring so that the root gets calculated after each Push
operation (Which is not currently the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removing these fields will need some refactoring. I did not think it through beyond the fact that the info about min/max namespaces also materializes in the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then, does it mean that the minID is actually equal to 0xFF which corresponds to the Max namespace ID?
Only to initialize the minID such that the if-block yields true for any namespace smaller than the max one (and hence the minID gets updated here properly):
Lines 400 to 407 in 29cca3c
func (n *NamespacedMerkleTree) updateMinMaxID(id namespace.ID) { | |
if id.Less(n.minNID) { | |
n.minNID = id | |
} | |
if n.maxNID.Less(id) { | |
n.maxNID = id | |
} | |
} |
proof.go
Outdated
return false | ||
} | ||
leafData := append(gotLeafNid, gotLeaf[nIDLen:]...) | ||
leafData := append(gotLeafNid, gotLeaf[nIDLen:]...) // TODO why not just passing the leaf? isn't it the same? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right!
nth := NewNmtHasher(h, nID.Size(), proof.isMaxNamespaceIDIgnored) | ||
min := namespace.ID(MinNamespace(root, nID.Size())) | ||
max := namespace.ID(MaxNamespace(root, nID.Size())) | ||
// TODO [Me] this never happens, the min and max are exactly nID.Size() bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passed-in nID could be any length though (it's just a byte slice):
Line 5 in b04eea5
type ID []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passed-in nID could be any length though
Right, however, just to clarify, what I meant is that the MinNamespace(root, nID.Size())
and MaxNamespace(root, nID.Size())
functions are designed to always return a namespace ID with a size of nID.Size()
.
Line 425 in b04eea5
func MinNamespace(hash []byte, size namespace.IDSize) []byte { |
Thus, considering this, we do not need to check the size of the retrieved namespace IDs i.e., the part below:
if nID.Size() != min.Size() || nID.Size() != max.Size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes, you are right.
// TODO [Me] Shouldn't we instead return the first or the last node in the tree as the exclusion proof? | ||
// TODO [Me] although I think this current logic is based on the premise that the root of the tree is trusted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. I thought the root (which commits/includes the min/max nids of the tree) is proof enough that the tree does not contain the passed-in nID
. Is there a way to construct a tree/root such that the root is not sufficient proof (that the namespace is not included)? I do not think so but maybe I am missing sth. These are exactly the edge-cases that might require further thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all the querying nodes have access to the correct NMT roots, then there should be no problem hence no exclusion proof is required (I will still think about it to see if any corner cases may happen). Thanks for your answer!
if !found { | ||
// To generate a proof for an absence we calculate the | ||
// position of the leaf that is in the place of where | ||
// the namespace would be in: | ||
proofStart = n.calculateAbsenceIndex(nID) | ||
proofStart = n.calculateAbsenceIndex(nID) // TODO [Me] this could simply return a range, to avoid the line below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
recurse = func(start, end int, includeNode bool) []byte { | ||
if start >= len(n.leafHashes) { | ||
if start >= len(n.leafHashes) { // TODO [Me] why against leafHashes? and not leaves? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason. The lengths of both should be the same.
nmt.go
Outdated
// check whether the subtree representing the [start, end) range of leaves has overlap with the | ||
// queried proof range i.e., [proofStart, proofEnd) | ||
// if not | ||
if (end <= proofStart || start >= proofEnd) && includeNode { //TODO [Me] the `&& includeNode` seems ineffective and unnecessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is about a small simplification on the if statement.
What I meant is that if we remove the condition on the includeNode
from the if
statement, we could get the same result, i.e.,
newIncludeNode := includeNode
if (end <= proofStart || start >= proofEnd) {
newIncludeNode = false
}
The reason is that:
In the original version, If includeNode
is false
, the if
block is never reached, hence the newIncludeNode
contains the includeNode
value i.e., false
(the same result is yield in the new if statement)
In the original version, if the includeNode
is true
, and if the first condition is true
i.e., (end <= proofStart || start >= proofEnd)
then we enter the if block and toggle the newIncludeNode
to false
(which is the same result we obtain from the new if statement).
if len(gotLeaf) < int(nIDLen) { | ||
// conflicting namespace sizes | ||
return false | ||
} | ||
gotLeafNid := namespace.ID(gotLeaf[:nIDLen]) | ||
gotLeafNid := namespace.ID(gotLeaf[:nIDLen]) // TODO [Me] a helper function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 everything that improves readability (without impacting performance much) should be done.
var leafIndex uint64 | ||
// TODO [Me] Why called leftSubtrees? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what would be a better name. LeftSubtreeRoots maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was originally about why the the data populated inside leftSubtrees
represent the left subtrees. Later, by examining the code, I figured why. I should have removed this comment.
Nevertheless, I liked your suggested name i.e., LeftSubtreeRoots
, it is more indicative of the content of the variable.
@liamsi I have added a few other questions and suggestions, would appreciate your review and thoughts on them. |
Closing the PR since it has fulfilled its intended purpose, and several of the questions raised in this conversation have been converted into corresponding GH issues or addressed in separate PRs. If there are any remaining matters that need attention, I will address them by creating additional GH issues. |
This is not a formal pull request for review, but rather a place to share my findings, questions, and suggestions regarding the implementation of the NMT as part of the following EPIC (celestiaorg/celestia-app#1296). I am conducting a thorough review of the NMT implementation and design.
The current content of the PR will undertake more updates as I am continuously writing and updating the code documentation, and adding more questions and suggestion. I am using this PR only as a place to communicate my thoughts.
To enhance the code documentation, I have added descriptions for some of the functions that were previously lacking proper documentation. I would appreciate if you could take a look and provide feedback, @liamsi.
My suggestions and questions are annotated as
TODO [Me]
to distinguish them from existing annotations in the code. I will continue to add comments and questions as I proceed with my review. Upon resolution of any questions, I will replace the annotations with proper documentation. If my suggestions are approved, I will open GitHub issues to implement them in subsequent pull requests.The primary objectives of my code review are to:
Proof
, particularly the order of nodes retrieved in the "nodes" field of theProof
structure and how to interpret the other fields in this struct.cc: @evan-forbes