Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMT Review #94

Closed
wants to merge 33 commits into from
Closed

NMT Review #94

wants to merge 33 commits into from

Conversation

staheri14
Copy link
Contributor

@staheri14 staheri14 commented Feb 1, 2023

This is not a formal pull request for review, but rather a place to share my findings, questions, and suggestions regarding the implementation of the NMT as part of the following EPIC (celestiaorg/celestia-app#1296). I am conducting a thorough review of the NMT implementation and design.
The current content of the PR will undertake more updates as I am continuously writing and updating the code documentation, and adding more questions and suggestion. I am using this PR only as a place to communicate my thoughts.

To enhance the code documentation, I have added descriptions for some of the functions that were previously lacking proper documentation. I would appreciate if you could take a look and provide feedback, @liamsi.

My suggestions and questions are annotated as TODO [Me] to distinguish them from existing annotations in the code. I will continue to add comments and questions as I proceed with my review. Upon resolution of any questions, I will replace the annotations with proper documentation. If my suggestions are approved, I will open GitHub issues to implement them in subsequent pull requests.

The primary objectives of my code review are to:

  1. Understand the proof generation algorithm and the structure of the Proof, particularly the order of nodes retrieved in the "nodes" field of the Proof structure and how to interpret the other fields in this struct.
  2. Evaluate the verification procedure.
  3. Identify any opportunities for security or performance improvement.
  4. Improve the code documentation and readability, which would also be beneficial for future maintainers.

cc: @evan-forbes

@staheri14 staheri14 self-assigned this Feb 2, 2023
Comment on lines +9 to 10
// TODO [Me] Shouldn't we specify that the first 8 bytes represent the namespace.ID
type PrefixedData []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the issue with this is that the library wants to support varying namespace sizes (not only 8 bytes necessarily).

We could instead:

  • just not use a type alias for byte and simply accept that this is even easier to misuse (would have the nice side effect that the user would not need to cast between bytes and PrefixedData at all)
  • use a dedicated type; e.g. one of the two options here: namespace: PrefixedData/PrefixedData8 are easy to misuse #71

nmt.go Show resolved Hide resolved
@@ -115,6 +119,7 @@ func New(h hash.Hash, setters ...Option) *NamespacedMerkleTree {
leaves: make([][]byte, 0, opts.InitialCapacity),
leafHashes: make([][]byte, 0, opts.InitialCapacity),
namespaceRanges: make(map[string]leafRange),
// TODO [Me] Shouldn't minNID be populated by `0x00`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, this would not work:

nmt/nmt.go

Line 400 in b04eea5

func (n *NamespacedMerkleTree) updateMinMaxID(id namespace.ID) {

That said, I think we can get rid of these two fields entirely!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then, does it mean that the minID is actually equal to 0xFF which corresponds to the Max namespace ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I think we can get rid of these two fields entirely!

Noted!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I just double checked the code, and figured these two fields represent the range of the namespace IDs of the data stored by a neamspace tree, and they are accessed in here:

nmt/nmt.go

Line 174 in ac87f1c

if nID.Less(n.minNID) || n.maxNID.Less(nID) { // TODO [Me] we could move this entire if block inside the `foundInRange` function

This means, as it is, we cannot remove these two fields.
However, we could retrieve the same information i.e., the namespaceID range by looking at the tree rawRoot field (as it is supposed to embody the max and min namespace IDs of the entire tree). However, that requires a bit of refactoring so that the root gets calculated after each Push operation (Which is not currently the case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removing these fields will need some refactoring. I did not think it through beyond the fact that the info about min/max namespaces also materializes in the root.

Copy link
Member

@liamsi liamsi Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then, does it mean that the minID is actually equal to 0xFF which corresponds to the Max namespace ID?

Only to initialize the minID such that the if-block yields true for any namespace smaller than the max one (and hence the minID gets updated here properly):

nmt/nmt.go

Lines 400 to 407 in 29cca3c

func (n *NamespacedMerkleTree) updateMinMaxID(id namespace.ID) {
if id.Less(n.minNID) {
n.minNID = id
}
if n.maxNID.Less(id) {
n.maxNID = id
}
}

proof.go Outdated
return false
}
leafData := append(gotLeafNid, gotLeaf[nIDLen:]...)
leafData := append(gotLeafNid, gotLeaf[nIDLen:]...) // TODO why not just passing the leaf? isn't it the same?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right!

nmt.go Outdated Show resolved Hide resolved
nmt.go Show resolved Hide resolved
proof.go Show resolved Hide resolved
nth := NewNmtHasher(h, nID.Size(), proof.isMaxNamespaceIDIgnored)
min := namespace.ID(MinNamespace(root, nID.Size()))
max := namespace.ID(MaxNamespace(root, nID.Size()))
// TODO [Me] this never happens, the min and max are exactly nID.Size() bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passed-in nID could be any length though (it's just a byte slice):

type ID []byte

Copy link
Contributor Author

@staheri14 staheri14 Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passed-in nID could be any length though

Right, however, just to clarify, what I meant is that the MinNamespace(root, nID.Size()) and MaxNamespace(root, nID.Size()) functions are designed to always return a namespace ID with a size of nID.Size().

nmt/nmt.go

Line 425 in b04eea5

func MinNamespace(hash []byte, size namespace.IDSize) []byte {

Thus, considering this, we do not need to check the size of the retrieved namespace IDs i.e., the part below:
if nID.Size() != min.Size() || nID.Size() != max.Size()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, you are right.

proof.go Show resolved Hide resolved
Comment on lines +175 to +176
// TODO [Me] Shouldn't we instead return the first or the last node in the tree as the exclusion proof?
// TODO [Me] although I think this current logic is based on the premise that the root of the tree is trusted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I thought the root (which commits/includes the min/max nids of the tree) is proof enough that the tree does not contain the passed-in nID. Is there a way to construct a tree/root such that the root is not sufficient proof (that the namespace is not included)? I do not think so but maybe I am missing sth. These are exactly the edge-cases that might require further thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the querying nodes have access to the correct NMT roots, then there should be no problem hence no exclusion proof is required (I will still think about it to see if any corner cases may happen). Thanks for your answer!

if !found {
// To generate a proof for an absence we calculate the
// position of the leaf that is in the place of where
// the namespace would be in:
proofStart = n.calculateAbsenceIndex(nID)
proofStart = n.calculateAbsenceIndex(nID) // TODO [Me] this could simply return a range, to avoid the line below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

nmt.go Show resolved Hide resolved
recurse = func(start, end int, includeNode bool) []byte {
if start >= len(n.leafHashes) {
if start >= len(n.leafHashes) { // TODO [Me] why against leafHashes? and not leaves?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. The lengths of both should be the same.

nmt.go Show resolved Hide resolved
nmt.go Outdated
// check whether the subtree representing the [start, end) range of leaves has overlap with the
// queried proof range i.e., [proofStart, proofEnd)
// if not
if (end <= proofStart || start >= proofEnd) && includeNode { //TODO [Me] the `&& includeNode` seems ineffective and unnecessary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is about a small simplification on the if statement.
What I meant is that if we remove the condition on the includeNode from the if statement, we could get the same result, i.e.,

newIncludeNode := includeNode
if (end <= proofStart || start >= proofEnd) {
			newIncludeNode = false
		}

The reason is that:
In the original version, If includeNode is false, the if block is never reached, hence the newIncludeNode contains the includeNode value i.e., false (the same result is yield in the new if statement)
In the original version, if the includeNode is true, and if the first condition is true i.e., (end <= proofStart || start >= proofEnd) then we enter the if block and toggle the newIncludeNode to false (which is the same result we obtain from the new if statement).

proof.go Outdated Show resolved Hide resolved
if len(gotLeaf) < int(nIDLen) {
// conflicting namespace sizes
return false
}
gotLeafNid := namespace.ID(gotLeaf[:nIDLen])
gotLeafNid := namespace.ID(gotLeaf[:nIDLen]) // TODO [Me] a helper function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 everything that improves readability (without impacting performance much) should be done.

var leafIndex uint64
// TODO [Me] Why called leftSubtrees?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what would be a better name. LeftSubtreeRoots maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was originally about why the the data populated inside leftSubtrees represent the left subtrees. Later, by examining the code, I figured why. I should have removed this comment.
Nevertheless, I liked your suggested name i.e., LeftSubtreeRoots, it is more indicative of the content of the variable.

proof.go Outdated Show resolved Hide resolved
@staheri14
Copy link
Contributor Author

@liamsi I have added a few other questions and suggestions, would appreciate your review and thoughts on them.

@staheri14
Copy link
Contributor Author

staheri14 commented May 12, 2023

Closing the PR since it has fulfilled its intended purpose, and several of the questions raised in this conversation have been converted into corresponding GH issues or addressed in separate PRs. If there are any remaining matters that need attention, I will address them by creating additional GH issues.

@staheri14 staheri14 closed this May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants