Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-7594: Decouple network subnets from das-core #3832

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions configs/mainnet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ WHISK_PROPOSER_SELECTION_GAP: 2

# EIP7594
NUMBER_OF_COLUMNS: 128
NUMBER_OF_CUSTODY_GROUPS: 128
DATA_COLUMN_SIDECAR_SUBNET_COUNT: 128
MAX_REQUEST_DATA_COLUMN_SIDECARS: 16384
SAMPLES_PER_SLOT: 8
Expand Down
1 change: 1 addition & 0 deletions configs/minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ WHISK_PROPOSER_SELECTION_GAP: 1

# EIP7594
NUMBER_OF_COLUMNS: 128
NUMBER_OF_CUSTODY_GROUPS: 128
DATA_COLUMN_SIDECAR_SUBNET_COUNT: 128
MAX_REQUEST_DATA_COLUMN_SIDECARS: 16384
SAMPLES_PER_SLOT: 8
Expand Down
77 changes: 42 additions & 35 deletions specs/_features/eip7594/das-core.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,28 @@
- [Custom types](#custom-types)
- [Configuration](#configuration)
- [Data size](#data-size)
- [Networking](#networking)
- [Custody setting](#custody-setting)
- [Containers](#containers)
- [`DataColumnSidecar`](#datacolumnsidecar)
- [`MatrixEntry`](#matrixentry)
- [Helper functions](#helper-functions)
- [`get_custody_columns`](#get_custody_columns)
- [`get_custody_groups`](#get_custody_groups)
- [`compute_columns_for_custody_group`](#compute_columns_for_custody_group)
- [`compute_matrix`](#compute_matrix)
- [`recover_matrix`](#recover_matrix)
- [`get_data_column_sidecars`](#get_data_column_sidecars)
- [Custody](#custody)
- [Custody requirement](#custody-requirement)
- [Public, deterministic selection](#public-deterministic-selection)
- [Subnet sampling](#subnet-sampling)
- [Custody sampling](#custody-sampling)
- [Extended data](#extended-data)
- [Column gossip](#column-gossip)
- [Parameters](#parameters)
- [Reconstruction and cross-seeding](#reconstruction-and-cross-seeding)
- [FAQs](#faqs)
- [Row (blob) custody](#row-blob-custody)
- [Subnet stability](#subnet-stability)
- [Why don't nodes custody rows?](#why-dont-nodes-custody-rows)
- [Why don't we rotate custody over time?](#why-dont-we-rotate-custody-over-time)
- [Does having a lot of column subnets make the network unstable?](#does-having-a-lot-of-column-subnets-make-the-network-unstable)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- /TOC -->
Expand All @@ -54,6 +55,7 @@ The following values are (non-configurable) constants used throughout the specif
| - | - | - |
| `RowIndex` | `uint64` | Row identifier in the matrix of cells |
| `ColumnIndex` | `uint64` | Column identifier in the matrix of cells |
| `CustodyIndex` | `uint64` | Custody group identifier in the set of custody groups |

## Configuration

Expand All @@ -63,18 +65,13 @@ The following values are (non-configurable) constants used throughout the specif
| - | - | - |
| `NUMBER_OF_COLUMNS` | `uint64(CELLS_PER_EXT_BLOB)` (= 128) | Number of columns in the extended data matrix |

### Networking

| Name | Value | Description |
| - | - | - |
| `DATA_COLUMN_SIDECAR_SUBNET_COUNT` | `uint64(128)` | The number of data column sidecar subnets used in the gossipsub protocol |

### Custody setting

| Name | Value | Description |
| - | - | - |
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumnSidecar` random samples a node queries per slot |
| `CUSTODY_REQUIREMENT` | `4` | Minimum number of subnets an honest node custodies and serves samples from |
| `NUMBER_OF_CUSTODY_GROUPS` | `128` | Number of custody groups available for nodes to custody |
| `CUSTODY_REQUIREMENT` | `4` | Minimum number of custody groups an honest node custodies and serves samples from |

### Containers

Expand Down Expand Up @@ -102,33 +99,39 @@ class MatrixEntry(Container):

## Helper functions

### `get_custody_columns`
### `get_custody_groups`

```python
def get_custody_columns(node_id: NodeID, custody_subnet_count: uint64) -> Sequence[ColumnIndex]:
assert custody_subnet_count <= DATA_COLUMN_SIDECAR_SUBNET_COUNT
def get_custody_groups(node_id: NodeID, custody_group_count: uint64) -> Sequence[CustodyIndex]:
assert custody_group_count <= NUMBER_OF_CUSTODY_GROUPS

subnet_ids: List[uint64] = []
custody_groups: List[uint64] = []
current_id = uint256(node_id)
while len(subnet_ids) < custody_subnet_count:
subnet_id = (
while len(custody_groups) < custody_group_count:
custody_group = CustodyIndex(
bytes_to_uint64(hash(uint_to_bytes(uint256(current_id)))[0:8])
% DATA_COLUMN_SIDECAR_SUBNET_COUNT
% NUMBER_OF_CUSTODY_GROUPS
)
if subnet_id not in subnet_ids:
subnet_ids.append(subnet_id)
if custody_group not in custody_groups:
custody_groups.append(custody_group)
if current_id == UINT256_MAX:
# Overflow prevention
current_id = NodeID(0)
current_id += 1

assert len(subnet_ids) == len(set(subnet_ids))
assert len(custody_groups) == len(set(custody_groups))
return sorted(custody_groups)
```

columns_per_subnet = NUMBER_OF_COLUMNS // DATA_COLUMN_SIDECAR_SUBNET_COUNT
### `compute_columns_for_custody_group`

```python
def compute_columns_for_custody_group(custody_group: CustodyIndex) -> Sequence[ColumnIndex]:
assert custody_group < NUMBER_OF_CUSTODY_GROUPS
columns_per_group = NUMBER_OF_COLUMNS // NUMBER_OF_CUSTODY_GROUPS
ppopth marked this conversation as resolved.
Show resolved Hide resolved
return sorted([
ColumnIndex(DATA_COLUMN_SIDECAR_SUBNET_COUNT * i + subnet_id)
for i in range(columns_per_subnet)
for subnet_id in subnet_ids
ColumnIndex(NUMBER_OF_CUSTODY_GROUPS * i + custody_group)
for i in range(columns_per_group)
])
```

Expand Down Expand Up @@ -220,21 +223,21 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock,

### Custody requirement

Each node downloads and custodies a minimum of `CUSTODY_REQUIREMENT` subnets per slot. The particular subnets that the node is required to custody are selected pseudo-randomly (more on this below).
Columns are grouped into custody groups. Nodes custodying a custody group MUST custody all the columns in that group.

A node *may* choose to custody and serve more than the minimum honesty requirement. Such a node explicitly advertises a number greater than `CUSTODY_REQUIREMENT` through the peer discovery mechanism, specifically by setting a higher value in the `custody_subnet_count` field within its ENR. This value can be increased up to `DATA_COLUMN_SIDECAR_SUBNET_COUNT`, indicating a super-full node.
A node *may* choose to custody and serve more than the minimum honesty requirement. Such a node explicitly advertises a number greater than `CUSTODY_REQUIREMENT` through the peer discovery mechanism, specifically by setting a higher value in the `custody_group_count` field within its ENR. This value can be increased up to `NUMBER_OF_CUSTODY_GROUPS`, indicating a super-full node.

A node stores the custodied columns for the duration of the pruning period and responds to peer requests for samples on those columns.

### Public, deterministic selection

The particular columns that a node custodies are selected pseudo-randomly as a function (`get_custody_columns`) of the node-id and custody size -- importantly this function can be run by any party as the inputs are all public.
The particular columns/groups that a node custodies are selected pseudo-randomly as a function (`get_custody_groups`) of the node-id and custody size -- importantly this function can be run by any party as the inputs are all public.

*Note*: increasing the `custody_size` parameter for a given `node_id` extends the returned list (rather than being an entirely new shuffle) such that if `custody_size` is unknown, the default `CUSTODY_REQUIREMENT` will be correct for a subset of the node's custody.

## Subnet sampling
## Custody sampling

At each slot, a node advertising `custody_subnet_count` downloads a minimum of `subnet_sampling_size = max(SAMPLES_PER_SLOT, custody_subnet_count)` total subnets. The corresponding set of columns is selected by `get_custody_columns(node_id, subnet_sampling_size)`, so that in particular the subset of columns to custody is consistent with the output of `get_custody_columns(node_id, custody_subnet_count)`. Sampling is considered successful if the node manages to retrieve all selected columns.
At each slot, a node advertising `custody_group_count` downloads a minimum of `sampling_size = max(SAMPLES_PER_SLOT, custody_group_count)` total custody groups. The corresponding set of columns is selected by `groups = get_custody_groups(node_id, sampling_size)` and `compute_columns_for_custody_group(group) for group in groups`, so that in particular the subset of columns to custody is consistent with the output of `get_custody_groups(node_id, custody_group_count)`. Sampling is considered successful if the node manages to retrieve all selected columns.

## Extended data

Expand All @@ -246,7 +249,7 @@ In this construction, we extend the blobs using a one-dimensional erasure coding

For each column -- use `data_column_sidecar_{subnet_id}` subnets, where `subnet_id` can be computed with the `compute_subnet_for_data_column_sidecar(column_index: ColumnIndex)` helper. The sidecars can be computed with the `get_data_column_sidecars(signed_block: SignedBeaconBlock, blobs: Sequence[Blob])` helper.

Verifiable samples from their respective column are distributed on the assigned subnet. To custody a particular column, a node joins the respective gossipsub subnet. If a node fails to get a column on the column subnet, a node can also utilize the Req/Resp protocol to query the missing column from other peers.
Verifiable samples from their respective column are distributed on the assigned subnet. To custody columns in a particular custody group, a node joins the respective gossipsub subnets. If a node fails to get columns on the column subnets, a node can also utilize the Req/Resp protocol to query the missing columns from other peers.

## Reconstruction and cross-seeding

Expand All @@ -262,7 +265,7 @@ Once the node obtains a column through reconstruction, the node MUST expose the

## FAQs

### Row (blob) custody
### Why don't nodes custody rows?

In the one-dimension construction, a node samples the peers by requesting the whole `DataColumnSidecar`. In reconstruction, a node can reconstruct all the blobs by 50% of the columns. Note that nodes can still download the row via `blob_sidecar_{subnet_id}` subnets.

Expand All @@ -273,6 +276,10 @@ The potential benefits of having row custody could include:

However, for simplicity, we don't assign row custody assignments to nodes in the current design.

### Subnet stability
### Why don't we rotate custody over time?

To start with a simple, stable backbone, for now, we don't shuffle the custody assignments via the deterministic custody selection helper `get_custody_groups`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs.

### Does having a lot of column subnets make the network unstable?

To start with a simple, stable backbone, for now, we don't shuffle the subnet assignments via the deterministic custody selection helper `get_custody_columns`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs.
No, the number of subnets doesn't really matter. What matters to the network stability is the number of nodes and the churn rate in the network. If the number of the nodes is too low, it's likely to have a network partition when some nodes are down. For the churn rate, if the churn rate is high, we even need to have a higher number of nodes, since nodes are likely to be turned off more often.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a higher subnet count may require us to bump up our target peer count, as there would be a higher probability that nodes have 0 peers in a given column subnet, and will have to do more frequent discovery.

@dknopik has done some interesting simulation on PeerDAS with 128 column subnets, and the LH-only network (without any supernodes) performs poorly with our current peer count (100), but much better when when target peers is increased to 300. With a small peer count, there's a higher chance that the node doesn't have peers across all column subnets, and fails to publish columns to the network. However, increasing peer count to 300 does increase the resource consumption in LH significantly, and our preference is to increase our peer gradually to ~120. I think there's some work we need to do to improve our peer selection logic for PeerDAS but thought it might be worth mentioning.

I'm wondering if we need to choose a column subnet count where configuring CUSTODY_REQUIREMENT to a lower number won't cause the network to fall apart, with the existing target peer count?

With this, I'm convinced that decoupling network and das-core do seems like a useful thing - because without decoupling, changing any of the parameters would have impact to BOTH security and networking. If we decouple networking and das-core, we could determine an optimal networking parameters and test different custody parameters, and vice versa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a higher subnet count may require us to bump up our target peer count, as there would be a higher probability that nodes have 0 peers in a given column subnet, and will have to do more frequent discovery.

According to this PR, the target peer count depends on the number of custody groups, right? Yes, before this PR it depends on the number of subnets, but this PR changes the structure and the terminology. WDYT?

9 changes: 5 additions & 4 deletions specs/_features/eip7594/p2p-interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
- [GetMetaData v3](#getmetadata-v3)
- [The discovery domain: discv5](#the-discovery-domain-discv5)
- [ENR structure](#enr-structure)
- [Custody subnet count](#custody-subnet-count)
- [Custody group count](#custody-group-count)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- /TOC -->
Expand All @@ -49,6 +49,7 @@

| Name | Value | Description |
|------------------------------------------------|------------------------------------------------|---------------------------------------------------------------------------|
| `DATA_COLUMN_SIDECAR_SUBNET_COUNT` | `128` | The number of data column sidecar subnets used in the gossipsub protocol |
| `MAX_REQUEST_DATA_COLUMN_SIDECARS` | `MAX_REQUEST_BLOCKS_DENEB * NUMBER_OF_COLUMNS` | Maximum number of data column sidecars in a single request |
| `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS` | `2**12` (= 4096 epochs, ~18 days) | The minimum epoch range over which a node must serve data column sidecars |

Expand Down Expand Up @@ -318,10 +319,10 @@ Requests the MetaData of a peer, using the new `MetaData` definition given above

#### ENR structure

##### Custody subnet count
##### Custody group count

A new field is added to the ENR under the key `csc` to facilitate custody data column discovery.
A new field is added to the ENR under the key `cgc` to facilitate custody data column discovery.

| Key | Value |
|--------|------------------------------------------|
| `csc` | Custody subnet count, `uint64` big endian integer with no leading zero bytes (`0` is encoded as empty byte string) |
| `cgc` | Custody group count, `uint64` big endian integer with no leading zero bytes (`0` is encoded as empty byte string) |
4 changes: 2 additions & 2 deletions specs/_features/eip7594/peer-sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For reference, the table below shows the number of samples and the number of all

A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries.

A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column.
A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_groups` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column.

If more than one candidate peer is found for a given column, a node SHOULD randomize its peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). If possible, it is also recommended to avoid requesting many columns from the same peer in order to avoid relying on and exposing the sample selection to a single peer.

Expand All @@ -115,4 +115,4 @@ A DAS provider is a consistently-available-for-DAS-queries, super-full (or high

DAS providers can also be found out-of-band and configured into a node to connect to directly and prioritize. Nodes can add some set of these to their local configuration for persistent connection to bolster their DAS quality of service.

Such direct peering utilizes a feature supported out of the box today on all nodes and can complement (and reduce attackability and increase quality-of-service) alternative peer discovery mechanisms.
Such direct peering utilizes a feature supported out of the box today on all nodes and can complement (and reduce attackability and increase quality-of-service) alternative peer discovery mechanisms.
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,26 @@
)


def _run_get_custody_columns(spec, rng, node_id=None, custody_subnet_count=None):
def _run_get_custody_columns(spec, rng, node_id=None, custody_group_count=None):
if node_id is None:
node_id = rng.randint(0, 2**256 - 1)

if custody_subnet_count is None:
custody_subnet_count = rng.randint(0, spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT)
if custody_group_count is None:
custody_group_count = rng.randint(0, spec.config.NUMBER_OF_CUSTODY_GROUPS)

result = spec.get_custody_columns(node_id, custody_subnet_count)
columns_per_group = spec.config.NUMBER_OF_COLUMNS // spec.config.NUMBER_OF_CUSTODY_GROUPS
groups = spec.get_custody_groups(node_id, custody_group_count)
yield 'node_id', 'meta', node_id
yield 'custody_subnet_count', 'meta', int(custody_subnet_count)
yield 'custody_group_count', 'meta', int(custody_group_count)

result = []
for group in groups:
group_columns = spec.compute_columns_for_custody_group(group)
assert len(group_columns) == columns_per_group
result.extend(group_columns)

assert len(result) == len(set(result))
assert len(result) == (
custody_subnet_count * spec.config.NUMBER_OF_COLUMNS // spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT
)
assert len(result) == custody_group_count * columns_per_group
assert all(i < spec.config.NUMBER_OF_COLUMNS for i in result)
python_list_result = [int(i) for i in result]

Expand All @@ -31,48 +36,48 @@ def _run_get_custody_columns(spec, rng, node_id=None, custody_subnet_count=None)
@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__min_node_id_min_custody_subnet_count(spec):
def test_get_custody_columns__min_node_id_min_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(spec, rng, node_id=0, custody_subnet_count=0)
yield from _run_get_custody_columns(spec, rng, node_id=0, custody_group_count=0)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__min_node_id_max_custody_subnet_count(spec):
def test_get_custody_columns__min_node_id_max_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(
spec, rng, node_id=0,
custody_subnet_count=spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT)
custody_group_count=spec.config.NUMBER_OF_CUSTODY_GROUPS)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__max_node_id_min_custody_subnet_count(spec):
def test_get_custody_columns__max_node_id_min_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(spec, rng, node_id=2**256 - 1, custody_subnet_count=0)
yield from _run_get_custody_columns(spec, rng, node_id=2**256 - 1, custody_group_count=0)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__max_node_id_max_custody_subnet_count(spec):
def test_get_custody_columns__max_node_id_max_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(
spec, rng, node_id=2**256 - 1,
custody_subnet_count=spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT,
custody_group_count=spec.config.NUMBER_OF_CUSTODY_GROUPS,
)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__max_node_id_max_custody_subnet_count_minus_1(spec):
def test_get_custody_columns__max_node_id_max_custody_group_count_minus_1(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(
spec, rng, node_id=2**256 - 2,
custody_subnet_count=spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT,
custody_group_count=spec.config.NUMBER_OF_CUSTODY_GROUPS,
)


Expand All @@ -81,7 +86,7 @@ def test_get_custody_columns__max_node_id_max_custody_subnet_count_minus_1(spec)
@single_phase
def test_get_custody_columns__short_node_id(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(spec, rng, node_id=1048576, custody_subnet_count=1)
yield from _run_get_custody_columns(spec, rng, node_id=1048576, custody_group_count=1)


@with_eip7594_and_later
Expand Down
Loading