Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(content): update filecoin piece content #1016

Merged
merged 10 commits into from
Sep 22, 2020
7 changes: 5 additions & 2 deletions content/systems/filecoin_files/piece/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ The first three steps take place on the client side.

2. In order to make a _Filecoin Piece_, the IPLD DAG is serialised into a ["Content-Addressable aRchive" (.car)](https://github.com/ipld/specs/blob/master/block-layer/content-addressable-archives.md#summary) file, which is in raw bytes format. A CAR file is an opaque blob of data that packs together and transfers IPLD nodes. The _Payload CID_ is common between the CAR'ed and un-CAR'ed constructions. This helps later during data retrieval, when data is transferred between the storage client and the storage provider as we discuss later.

3. The resulting .car file is _padded_ with extra zero bits in order for the file to make a binary Merkle tree. To achieve a clean binary Merkle Tree the .car file size has to be in some power of two (^2) size. A padding process, called `fr32 padding`, which adds two (2) zero bits to every 254 out of every 256 bits is applied to the input file. At the next step, the padding process takes the output of the `fr32 padding` process and finds the size above it that makes for a power of two size. This gap between the result of the `fr32 padding` and the next power of two size is padded with zeros.
3. The resulting .car file is _padded_ with extra zero bits in order for the file to make a binary Merkle tree. To achieve a clean binary Merkle Tree the .car file size has to be in some power of two (^2) size. A padding process, called `Fr32 padding`, which adds two (2) zero bits to every 254 out of every 256 bits is applied to the input file. At the next step, the padding process takes the output of the `Fr32 padding` process and finds the size above it that makes for a power of two size. This gap between the result of the `Fr32 padding` and the next power of two size is padded with zeros.


In order to justify the reasoning behind these steps, it is important to understand the overall negotiation process between the `StorageClient` and a `StorageProvider`. The piece CID or CommP is what is included in the deal that the client negotiates and agrees with the storage provider. When the deal is agreed, the client sends the file to the provider (using GraphSync). The provider has to construct the CAR file out of the file received and derive the Piece CID on their side. In order to avoid the client sending a different file to the one agreed, the Piece CID that the provider generates has to be the same as the one included in the deal negotiated earlier.

Expand All @@ -53,7 +54,9 @@ The following steps take place on the `StorageProvider` side (apart from step 4


**IMPORTANT NOTES:**
- Steps 2 and 3 above are specific to the Lotus implementation. The same outcome can be achieved in different ways. However, any implementation has to make sure that the initial IPLD DAG is serialised and padded so that calculating the Merkle root out of the resulting blob of data gives the same **Piece CID**. As long as this is the case, implementations can deviate from the first three steps above.
- `Fr32` is a 32-bit representation of a field element (which, in our case, is the arithmetic field of BLS12-381). To be well-formed, a value of type `Fr32` must _actually_ fit within that field, but this is not enforced by the type system. It is an invariant which must be perserved by correct usage.
In the case of so-called `Fr32 padding`, two zero bits are inserted 'after' a number requiring at most 254 bits to represent. This guarantees that the result will be `Fr32`, regardless of the value of the initial 254 bits. This is a 'conservative' technique, since for some initial values, only one bit of zero-padding would actually be required.
- Steps 2 and 3 above are specific to the Lotus implementation. The same outcome can be achieved in different ways, e.g., without using `Fr32` bit-padding. However, any implementation has to make sure that the initial IPLD DAG is serialised and padded so that it gives a clean binary tree, and therefore, calculating the Merkle root out of the resulting blob of data gives the same **Piece CID**. As long as this is the case, implementations can deviate from the first three steps above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. I do think the padding method is prescribed. Otherwise there will not be a deterministic mapping from raw data to CommP. It's just that this padding method is not the only possible way to convert from raw data into a sequence of Fr32 elements.

  2. I don't think anything about Fr32 is Lotus-specific: it's a rust-fil-proofs implementation detail.

I don't make these suggestions to complicate things. Probably incorporating them will lead to a simpler explanation. The current text is still not quite accurate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porcuquine can you please provide the right phrasing for this, so that we finalise?

- Finally, it is important to add a note related to the _Payload CID_ (discussed in the first two steps above) and the data retrieval process. The retrieval deal is negotiated on the basis of the _Payload CID_. When the retrieval deal is agreed, the retrieval miner starts sending the unsealed and "un-CAR'ed" file to the client. The transfer starts from the root node of the IPLD Merkle Tree and in this way the client can validate the _Payload CID_ from the beginning of the transfer and verify that the file they are receiving is the file they negotiated in the deal and not random bits.

## PieceStore
Expand Down