Cleanup spec (#283)

Layr-Labs · Feb 26, 2024 · 91838ba · 91838ba
1 parent 1dc1721
commit 91838ba
Show file tree

Hide file tree

Showing 32 changed files with 239 additions and 501 deletions.
diff --git a/docs/assets/architecture.png b/docs/assets/architecture.png
diff --git a/docs/assets/assignment-module.png b/docs/assets/assignment-module.png
diff --git a/docs/assets/attestation-layer-parts.png b/docs/assets/attestation-layer-parts.png
diff --git a/docs/assets/attestation-layer.png b/docs/assets/attestation-layer.png
diff --git a/docs/assets/bridging-module.png b/docs/assets/bridging-module.png
diff --git a/docs/assets/encoding-module.png b/docs/assets/encoding-module.png
diff --git a/docs/assets/network-layer.png b/docs/assets/network-layer.png
diff --git a/docs/spec/architecture.md b/docs/spec/architecture.md
diff --git a/docs/design/encoding.md → docs/spec/attestation/amortized-proving.md b/docs/design/encoding.md → docs/spec/attestation/amortized-proving.md
@@ -1,4 +1,4 @@
-# KZG FFT Encoder Backend
+# Amortized KZG Prover Backend
 
 It is important that the encoding and commitment tasks are able to be performed in seconds and that the dominating complexity of the computation is nearly linear in the degree of the polynomial. This is done using algorithms based on the Fast Fourier Transform (FFT).
 
@@ -11,7 +11,7 @@ We will also highlight the additional constraints on the Encoding interface whic
 
 As described in the [Encoding Module Specification](../spec/protocol-modules/storage/encoding.md), given a blob of data, we convert the blob to a polynomial $p(X) = \sum_{i=0}^{m-1} c_iX^i$ by simply slicing the data into a string of symbols, and interpreting this list of symbols as the tuple $(c_i)_{i=0}^{m-1}$.
 
-In the case of the KZG-FFT encoder, the polynomial lives on the field associated with the BN-254 elliptic curve, which as order [TODO: fill in order].
+In the case of the KZG-FFT encoder, the polynomial lives on the field associated with the BN254 elliptic curve, which as order [TODO: fill in order].
 
 Given this polynomial representation, the KZG commitment can be calculated as in [KZG polynomial commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html).
 
@@ -30,7 +30,7 @@ where $p_k$ gives the evaluation of the polynomial at $v^k \in S$. Letting $c$ d
 
 To evaluate the DFT programmatically, we want $m = n$. Notice that we can achieve this when $m > n$ by simply padding $c$ with zeros to be of length $m$.
 
-The use of the FFT can levy an additional requirement on the size of the group $S$. In our implementation, we require the size of $S$ to be a power of 2. For this, we can make use of the fact that the prime field associated with BN-254 contains a subgroup of order $2^{28}$, which in turn contains subgroups of orders spanning every power of 2 less than $2^{28}$.
+The use of the FFT can levy an additional requirement on the size of the group $S$. In our implementation, we require the size of $S$ to be a power of 2. For this, we can make use of the fact that the prime field associated with BN254 contains a subgroup of order $2^{28}$, which in turn contains subgroups of orders spanning every power of 2 less than $2^{28}$.
 
 
 As the encoding interface calls for the construction of `NumChunks` Chunks of length `ChunkLength`, our application requires that $S$ be of size `NumChunks*ChunkLength`, which in turn must be a power of 2.
@@ -60,4 +60,4 @@ As a simple illustrative example, suppose that  `AssignmentCoordinator` provides
 
 Supplied with these parameters, `Encoder.ParamsFromMins` will upgrade `ChunkLength` to the next highest power of 2, i.e., `ChunkLength` = 4, and leave `NumChunks` unchanged. The following figure illustrates how the indices will be assigned across the chunks in this scenario.
 
-![Worked example of chunk indices for ChunkLength=4, NumChunks=4](../assets/encoding-groups.png)
+![Worked example of chunk indices for ChunkLength=4, NumChunks=4](../../assets/encoding-groups.png)
diff --git a/...ec/protocol-modules/storage/assignment.md → docs/spec/attestation/assignment.md b/...ec/protocol-modules/storage/assignment.md → docs/spec/attestation/assignment.md
@@ -1,41 +1,22 @@
+## Assignment Module
 
-# Assignment
+The assignment module is essentially a rule which takes in the Ethereum chain state and outputs an allocation of chunks to DA operators. This can be generalized to a function that outputs a set of valid allocations.
 
-The assignment functionality within EigenDA is carried out by the `AssignmentCoordinator`, which is responsible for taking the current OperatorState and the security requirements represented by a given QuorumParam and determining or validating system parameters that will satisfy these security requirements given the OperatorStates. There are two classes of parameters that must be determined or validated:
+A chunk assignment has the following parameters: 
+1) **Indices**: the chunk indices that will be assigned to each DA node. Some DA nodes receive more than one chunk.
+2) **ChunkLength**: the length of each chunk (measured in number of symbols, as defined by the encoding module). We currently require all chunks to be of the same length, so this parameter is a scalar. 
 
-1) the chunk indices that will be assigned to each DA node.
-2) the length of each chunk (measured in number of symbols). In keeping with the constraint imposed by the Encoding module, all chunks must have the same length, so this parameter is a scalar.
+The assignment module is implemented by the `AssignmentCoordinator` interface. 
 
+![image](../../assets/assignment-module.png)
 
-## Interface
+### Assignment Logic
 
-The AssignmentCoordinator must implement the following interface, which facilitates with the above tasks:
+The standard assignment coordinator implements a very simple logic for determining the number of chunks per node and the chunk length, which we describe here.
 
-```go
-type AssignmentCoordinator interface {
+**Chunk Length**
 
-	// GetAssignments calculates the full set of node assignments.
-	GetAssignments(state *OperatorState, blobLength uint, info *BlobQuorumInfo) (map[OperatorID]Assignment, AssignmentInfo, error)
-
-	// GetOperatorAssignment calculates the assignment for a specific DA node
-	GetOperatorAssignment(state *OperatorState, header *BlobHeader, quorum QuorumID, id OperatorID) (Assignment, AssignmentInfo, error)
-
-	// ValidateChunkLength validates that the chunk length for the given quorum satisfies all protocol requirements
-	ValidateChunkLength(state *OperatorState, header *BlobHeader, quorum QuorumID) (bool, error)
-
-	// CalculateChunkLength calculates the chunk length for the given quorum that satisfies all protocol requirements
-	CalculateChunkLength(state *OperatorState, blobLength uint, param *SecurityParam) (uint, error)
-}
-```
-
-## Standard Assignment Security Logic
-
-The standard assignment coordinator implements a very simple logic for determining the number of chunks per node and the chunk length, which we describe here. More background concerning this design can be found in the [Design Document](../../../design/assignment.md)
-
-
-**Chunk Length**.
-
-The protocol requires that chunk lengths are sufficiently small that operators with a small proportion of stake are able to receive a quantity of data commensurate with their stake share. For each operator $i$, let $S_i$ signify the amount of stake held by that operator. 
+Chunk lengths must be sufficiently small that operators with a small proportion of stake will be able to receive a quantity of data commensurate with their stake share. For each operator $i$, let $S_i$ signify the amount of stake held by that operator. 
 
 We require that the chunk size $C$ satisfy
 
@@ -44,15 +25,13 @@ C \le \text{NextPowerOf2}\left(\frac{B}{\gamma}\max\left(\frac{\min_jS_j}{\sum_j
 $$
 
 
-where $\gamma = \beta-\alpha$, with $\alpha$ and $\beta$ as defined in the [Storage Overview](./overview.md).
+where $\gamma = \beta-\alpha$, with $\alpha$ and $\beta$ the adversary and quorum thresholds as defined in the [Overview](../overview.md).
 
 This means that as long as an operator has a stake share of at least $1/M_\text{max}$, then the encoded data that they will receive will be within a factor of 2 of their share of stake. Operators with less than $1/M_\text{max}$ of stake will receive no more than a $1/M_\text{max}$ of the encoded data. $M_\text{max}$ represents the maximum number of chunks that the disperser can be required to encode per blob. This limit is included because proving costs scale somewhat super-linearly with the number of chunks. 
 
 In the future, additional constraints on chunk length may be added; for instance, the chunk length may be set in order to maintain a fixed number of chunks per blob across all system states. Currently, the protocol does not mandate a specific value for the chunk length, but will accept the range satisfying the above constraint. The `CalculateChunkLength` function is provided as a convenience function that can be used to find a chunk length satisfying the protocol requirements. 
 
-
-
-**Index Assignment**.
+**Index Assignment**
 
 For each operator $i$, let $S_i$ signify the amount of stake held by that operator. We want for the number of chunks assigned to operator $i$ to satisfy
 
@@ -66,8 +45,8 @@ $$
 m_i = \text{ceil}\left(\frac{B S_i}{C\gamma \sum_j S_j}\right)\tag{1}
 $$
 
-**Correctness**.
-Let's show that any sets $U_q$ and $U_a$ satisfying the constraints in the [Acceptance Guarantee](./overview.md#acceptance-guarantee), the data held by the operators $U_q \setminus U_a$ will constitute an entire blob. The amount of data held by these operators is given by
+**Correctness**
+Let's show that any sets $U_q$ and $U_a$ satisfying the constraints in the [Consensus Layer Overview](../overview.md#consensus-layer), the data held by the operators $U_q \setminus U_a$ will constitute an entire blob. The amount of data held by these operators is given by
 
 $$
 \sum_{i \in U_q \setminus U_a} m_i C
@@ -79,7 +58,8 @@ $$
 \sum_{i \in U_q \setminus U_a} m_i C \ge  =\frac{B}{\gamma}\sum_{i \in U_q \setminus U_a}\frac{S_i}{\sum_j S_j} = \frac{B}{\gamma}\frac{\sum_{i \in U_q} S_i - \sum_{i \in U_a} S_i}{\sum_jS_j} \ge B \frac{\beta-\alpha}{\gamma} = B  \tag{2}
 $$
 
-Thus, the reconstruction requirement from the [Encoding](./encoding.md) module is satisfied. 
+Since the unique data held by these operators exceeds the size of a blob, the encoding module ensures that the original blob can be reconstructed from this data. 
+
 
 ## Validation Actions
 
@@ -91,7 +71,7 @@ When the DA node receives a `StoreChunks` request, it performs the following val
 - It uses the `ValidateChunkLength` to validate that the `ChunkLength` for the blob satisfies the above constraints. 
 - It uses `GetOperatorAssignment` to calculate the chunk indices for which it is responsible, and verifies that each of the chunks that it has received lies on the polynomial at these indices (see [Encoding validation actions](./encoding.md#validation-actions))
 
-This step ensures that each honest node has received the blobs for which it is accountable under the [Standard Assignment Coordinator](#standard-assignment-security-logic).
+This step ensures that each honest node has received the blobs for which it is accountable.
 
 Since the DA nodes will allow a range of `ChunkLength` values, as long as they satisfy the constraints of the protocol, it is necessary for there to be consensus on the `ChunkLength` that is in use for a particular blob and quorum. For this reason, the `ChunkLength` is included in the `BlobQuorumParam` which is hashed to create the merkle root contained in the `BatchHeaderHash` signed by the DA nodes. 
 

diff --git a/docs/spec/attestation/bridging.md b/docs/spec/attestation/bridging.md
@@ -0,0 +1,32 @@
+## Signature verification and bridging
+
+![image](../../assets/bridging-module.png)
+
+### L1 Bridging
+
+Bridging a DA attestion for a specific blob requires the following stages:
+- *Bridging the batch attestation*. This involves checking the aggregate signature of the DA nodes for the batch, and tallying up the total amount of stake the signing nodes.
+- *Verifying the blob inclusion*. Each batch contains a the root of a a Merkle tree whose leaves correspond to the blob headers contained in the batch. To verify blob inclusion, the associate Merkle proof must be supplied and evaluated. Furthermore, the specific quorum threshold requirement for the blob must be checked against the total amount of signing stake for the batch. 
+
+For the first stage, EigenDA makes use of the EigenLayer's default utilities for managing operator state, verifying aggregate BLS signatures, and checking the total stake held by the signing operators.
+
+For the second stage, the EigenDA provides a utility contract with a `verifyBlob` method which rollups would typically integrate into their fraud proof pathway in the following manner: 
+1. The rollup sequencer posts all lookup data needed to verify a blob against a batch to the rollup inbox contract. 
+2. To initiate a fraud proof, the challenger must call the `verifyBlob` method with the supplied lookup data. If the blob does not verify correctly, the blob is considered invalid. 
+
+#### Reorg behavior (this section is outdated)
+
+One aspect of the chain behavior of which the attestation protocol must be aware is that of chain reorganization. The following requirements relate to chain reorganizations:
+1. Signed attestations should remain valid under reorgs so that a disperser never needs to resend the data and gather new signatures.
+2. If an attestation is reorged out, a disperser should always be able to simply resubmit it after a specific waiting period.
+3. Payloads constructed by a disperser and sent to DA nodes should never be rejected due to reorgs.
+
+These requirements result in the following design choices:
+- Chunk allotments should be based on registration state from a finalized block.
+- If an attestation is reorged out and if the transaction containing the header of a batch is not present within `BLOCK_STALE_MEASURE` blocks since `referenceBlockNumber` and the block that is `BLOCK_STALE_MEASURE` blocks since `referenceBlockNumber`  is finalized, then the disperser should again start a new dispersal with that blob of data. Otherwise, the disperser must not re-submit another transaction containing the header of a batch associated with the same blob of data.
+- Payment payloads sent to DA nodes should only take into account finalized attestations.
+
+The first and second decisions satisfy requirements 1 and 2. The three decisions together satisfy requirement 3.
+
+Whenever the `confirmBatch` method of the ServiceManager.sol is called, the following checks are used to ensure that only finalized registration state is utilized:
+- Stake staleness check. The `referenceBlockNumber` is verified to be within `BLOCK_STALE_MEASURE` blocks before the confirmation block.This is to make sure that batches using outdated stakes are not confirmed. It is assured that stakes from within `BLOCK_STALE_MEASURE` blocks before confirmation are valid by delaying removal of stakes by `BLOCK_STALE_MEASURE + MAX_DURATION_BLOCKS`.
diff --git a/docs/spec/attestation/encoding.md b/docs/spec/attestation/encoding.md
@@ -0,0 +1,58 @@
+## Encoding Module
+
+The encoding module defines a procedure for blobs to be encoded in such a way that their successful reconstruction can be guaranteed given a large enough collection of unique encoded chunks. The procedure also allows for the chunks to be trustlessly verified against a blob commitment so that the disperser cannot violate the protocol.
+
+![image](../../assets/encoding-module.png)
+
+One way to think of the encoding module is that it must satisfy the following security requirements:
+1. *Adversarial tolerance for DA nodes*: We need to have tolerance to arbitrary adversarial behavior by any number of DA nodes up to some threshold. Note that while simple sharding approaches such as duplicating slices of the blob data have good tolerance to random node dropout, they have poor tolerance to worst-case adversarial behavior.
+2. *Adversarial tolerance for disperser*: We do not want to put trust assumptions on the encoder or rely on fraud proofs to detect if an encoding is done incorrectly.
+
+
+## Trustless Encoding via KZG and Reed-Solomon
+
+EigenDA uses a combination of Reed-Solomon (RS) erasure coding and KZG polynomial commitments to perform trustless  encoding. In this section, we provide a high level overview of how the EigenDA encoding module works and how it achieves these properties.
+
+### Reed Solomon Encoding
+
+Basic RS encoding is used to achieve the first requirement of *Adversarial tolerance for DA nodes*. This looks like the following:
+
+1. The blob data is represented as a string of symbols, where each symbol is elements in a certain finite field. The number of symbols is called the `BlobLength`
+2. These symbols are interpreted as the coefficients of a `BlobLength`-1 degree polynomial.
+3. This polynomial is evaluated at `NumChunks`*`ChunkLength` distinct indices.
+4. Chunks are constructed, where each chunk consists of the polynomial evaluations at `ChunkLength` distinct indices.
+
+Notice that given any number of chunks $M$ such that $M \times$`ChunkLength` >= `BlobLength`, via [polynomial interpolation](https://en.wikipedia.org/wiki/Polynomial_interpolation) it is possible to reconstruct the original polynomial, and therefore its coefficients which represent the original blob. 
+
+### Validation via KZG
+
+To address the requirement *Adversarial tolerance for disperser* using RS encoding alone requires fraud proofs: a challenger must download all of the encoded chunks and check that they lie on a polynomial corresponding to the blob commitment. 
+
+To avoid the need for fraud proofs, EigenDA follows the trail blazed by the Ethereum DA sharding roadmap in using [KZG polynomial commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html). 
+
+**Chunk Validation**
+
+Blobs sent to EigenDA are identified by their KZG commitment (which can be calculated by the disperser and easily validated by the rollup sequencer). When the disperser generates the encoded blob chunks, it also generates a collection of opening proofs which the DA nodes can use to trustlessly verify that their chunks fall on the blob polynomial the correct indices (note: the indices are jointly derived by the disperser and DA nodes from the chain state using the logic in the Assignment module to ensure that the evaluation indices for each node are unique).
+
+**Blob Size Verification**
+KZG commitments also can be used to verify the degree of the original polynomial, which in turn corresponds to the size of the original blob. Having a trustlessly verifiable upper bound on the size of the blob is necessary for DA nodes to verify the correctness of the chunk assignment defined by the assignment module.
+
+The KZG commitment relies on a structured reference string (SRS) containing a generator point $G$ multiplied by all of the powers of some secret field element $\tau$, up to some maximum power $n$. This means that it is not possible to use this SRS to commit to a polynomial of degree greater than $n$. A consequence of this is that if $p(x)$ is a polynomial of degree greater than $m$, it will not be possible to commit to the polynomial $x^{n-m}p(x)$. A "valid" commitment to the polynomial $x^{n-m}p(x)$ thus constitutes a proof that the polynomial $p(x)$ is of degree less then or equal to $m$. 
+
+In practice, this looks like the following: 
+1. If the disperser wishes to claim that the polynomial $p(x)$ is of degree less than or equal to $m$, they must provide along with the commitment $C_1$ to $p$, a commitment $C_2$ to $q(x) = x^{n-m}p(x)$. 
+2. The verifier then performs the pairing check $e(C_1,[x^{n-m}]_2) = e(C_2,H)$, where $H$ is the G2 generator and $[x^{n-m}]_2$ is the $n-m$'th power of tau. This pairing will only evaluate correctly when $C_2$ was constructed as described above and $deg(p) <= m$. 
+
+Note: The blob length verification here allows for the blob length to be upper-bounded; it cannot be used to prove the exact blob length.
+
+
+### Prover Optimizations
+
+EigenDA makes use of the results of [Fast Amortized Kate Proofs](https://github.com/khovratovich/Kate/blob/master/Kate_amortized.pdf), developed for Ethereum's sharding roadmap, to reduce the computational complexity for proof generation. 
+
+See the [full discussion](./amortized-proving.md)
+
+
+### Verifier Optimizations
+
+Without any optimizations, the KZG verification complexity can lead to a computational bottleneck for the DA nodes. Fortunately, the [Universal Verification Equation](https://ethresear.ch/t/a-universal-verification-equation-for-data-availability-sampling/13240) developed for Danksharding data availability sampling dramatically reduces the complexity. EigenDA has implemented this optimization to eliminate this bottleneck for the DA nodes.