EIP-7594: Decouple network subnets from das-core #3832

ppopth · 2024-07-03T08:27:15Z

Currently we use subnets as a unit of custody in the PeerDAS core protocol because it doesn't make sense to partially custody only some columns in the subnets and waste the bandwidth to download the columns the node doesn't custody.

Since subnets correspond to GossipSub topics which are in a layer lower than the core protocol, using subnets as a unit of custody makes the core layer and the network layer too coupled to each other and leave no room for the network layer flexibility.

This commit introduces "custody groups" which are used as a unit of custody instead of subnets.

The immediate benefit of the decoupling is that we can immediately increase the number of subnets without affecting the expected number of peers to cover all columns and affecting the network stability and without touching the core protocol.

The reason we want to increase the number of subnets to match the number of columns is that the columns will be propagated through the network faster when they have their own subnets. Just like EIP-4844, each blob has its own subnet because, if all the blobs are in a single subnet, the blobs will be propagated more slowly.

Since we keep the number of custody groups the same as the previous number of subnets (32), the expected number of peers you need to cover all the columns is not changed. In fact, you need only NUMBER_OF_COLUMNS and NUMBER_OF_CUSTODY_GROUPS to analyze the expected number, which makes the core protocol completely decoupled from the network layer.

djrtwo · 2024-07-03T16:12:08Z

I haven't done a detailed review of how you do the layering here, but I generally support this direction to ensure that core protocol and network are sufficiently decoupled so they can be tuned/optimized independently

Nashatyrev · 2024-07-04T14:32:42Z

I think this is a good proposal. I believe it has something in common with former 'Network Shards' idea.

However it's worth to keep in mind that in this case the gossip 'backbone' is defined rather implicitly (by the upper das core layer) than explicitly. Changing for example NUMBER_OF_CUSTODY_GROUPS to 128 would affect not only the upper das core layer but would also affect the network layer by making subnet backbones too small and potentially unreliable/vulnerable.

cskiraly · 2024-07-09T09:45:58Z

I like the idea of decoupling the concepts related to custody allocation from the delivery method.

My concern is that the only reason I know of for grouping columns in custody is the (supposedly) inefficiency of gossipsub (implementations) in handling a larger number of topics and having smaller topics. Anyway, I think the conceptual decoupling is good to have.

Another thing I see in the PR is that you also have FAQ extensions and alike. Haven't seen the details yet, I will try to give it a review later on.

cskiraly · 2024-07-09T09:48:40Z

One more thing: parameter changes are also proposed in #3779
That one is proposing to change DATA_COLUMN_SIDECAR_SUBNET_COUNT to 128

ppopth · 2024-07-09T10:04:01Z

Changing for example NUMBER_OF_CUSTODY_GROUPS to 128 would affect not only the upper das core layer but would also affect the network layer by making subnet backbones too small and potentially unreliable/vulnerable.

From another point of view, that can be seen as a non-networking issue. Let me rephrase your issue. If the number of custody groups is too large, the number of nodes in each group will be so small that you cannot find the peers to cover all the custody groups.

I think whether it's an issue on the network layer is quite subjective. However, the whole point of doing DAS is to reduce the bandwidth consumption, so every part of the protocol (including the core one) can be seen as related to the network in some way.

ppopth · 2024-07-09T10:25:00Z

Another way to decouple the GossipSub topics from the core protocol is not to correspond subnets to GossipSub topics, but think of subnets as groups of nodes. However, we use the term "subnets" to mean GossipSub topics a lot in the past, so it's hard to think of subnets as anything else.

ppopth · 2024-07-16T14:41:40Z

One more thing: parameter changes are also proposed in #3779
That one is proposing to change DATA_COLUMN_SIDECAR_SUBNET_COUNT to 128

I have a big concern on that change. Let's discuss it in that PR.

specs/_features/eip7594/das-core.md

specs/_features/eip7594/p2p-interface.md

tests/formats/networking/get_custody_groups.md

tests/formats/networking/compute_columns_for_custody_group.md

zilm13 · 2024-07-18T19:21:11Z

I miss the method which translates custody group into subnet, I think we need it or I don't understand an idea a bit. So instead of this

def compute_subnet_for_data_column_sidecar(column_index: ColumnIndex) -> SubnetID:
    return SubnetID(column_index % DATA_COLUMN_SIDECAR_SUBNET_COUNT)

we should have something like

def compute_subnet_for_custody_group(custody_group: UInt64) -> SubnetID:
    assert custody_group < NUMBER_OF_CUSTODY_GROUPS
    return SubnetID(custody_group % DATA_COLUMN_SIDECAR_SUBNET_COUNT)

And also it would be good to have compile assertion that NUMBER_OF_CUSTODY_GROUPS is set to the number, which could be always translated to DATA_COLUMN_SIDECAR_SUBNET_COUNT (say, we don't have 32 groups and 64 subnets or 33 groups and 32 subnets)

ppopth · 2024-07-25T04:32:02Z

I miss the method which translates custody group into subnet, I think we need it or I don't understand an idea a bit. So instead of this

The idea is that you can get the subnets by calling compute_columns_for_custody_group and then compute_subnet_for_data_column_sidecar. Does that make sense?

We don't want to do compute_subnet_for_custody_group because it will allows only one subnet for each custody group.

And also it would be good to have compile assertion that NUMBER_OF_CUSTODY_GROUPS is set to the number, which could be always translated to DATA_COLUMN_SIDECAR_SUBNET_COUNT (say, we don't have 32 groups and 64 subnets or 33 groups and 32 subnets)

What do you mean? NUMBER_OF_CUSTODY_GROUPS is independent from DATA_COLUMN_SIDECAR_SUBNET_COUNT and only the assertion that I can think of is that DATA_COLUMN_SIDECAR_SUBNET_COUNT should be greater than or equal to NUMBER_OF_CUSTODY_GROUPS.

we don't have 32 groups and 64 subnets

Absolutely, we can have 32 groups and 64 subnets.

zilm13 · 2024-07-30T08:55:05Z

The idea is that you can get the subnets by calling compute_columns_for_custody_group and then compute_subnet_for_data_column_sidecar

@ppopth now I got this, thank you for the explanation

g11tech

lgtm from lodestar

ppopth · 2024-08-14T13:21:25Z

I think this is ready to be merged?

specs/_features/eip7594/das-core.md

specs/_features/eip7594/p2p-interface.md

specs/_features/eip7594/das-core.md

jimmygchen · 2024-08-21T02:01:05Z

I know this proposal has been around for a while, and I actually reviewed it earlier, but I have some additional thoughts.

I’m trying to wrap my head around the benefits of this proposal. It looks like the idea is to allow us to tweak the subnet count without touching the core protocol (custody_requirement) and its security aspects, which makes sense given where PeerDAS is right now. But I’m wondering if the benefits we gain from this would really justify the added complexity, and whether this change is something we truly need at this stage.

Personally, I’m not entirely convinced that this change is necessary. The more we decouple things, the more complexity we introduce, which might not be ideal. Since we're already looking at up to 128 subnets, it feels like we won’t need to go beyond that in the near future.

I’ve heard that research shows it can be more efficient to spread larger chunks of data across multiple subnets rather than sticking to just one. So, if we end up increasing the blob count significantly, I can see how increasing the subnet count might make sense. But for now, I’m not sure there’s a big advantage in going above 128 subnets unless we’re dealing with a much higher blob count. Even with 256 blobs per block, we’re still only sending up to 512kb per block to each subnet.

Maybe there’s something I’m missing here and about the future plans - Is there an issue with the current spec, or are we just trying to future-proof for scenarios we’re not certain about yet?

ppopth · 2024-09-10T14:29:24Z

@jimmygchen Sorry for the late reply.

I would like to compare this PR to attestation subnets and attestation committees. In the phase0 spec, we decoupled subnets from committees as well. In real life and as I always heard, no one really mentions committees. People always talk about attestation subnets. In fact, according to the spec, the phase0 core protocol has nothing to do with subnets. They only deal with committees. The subnets are in the network layer, another layer.

So in PeerDAS, people may still discuss things in term of subnets (just like the attestation subnets), but I think it doesn't add more complexity since phase0 is already complex (according to your standard, some people may say it's not complex).

jimmygchen · 2024-09-11T03:47:56Z

Thanks for the context @ppopth.

It's a fair comparison with some differences. Attestations have a target committee size, while with DAS, we’ve got a fixed number of columns/groups. And repeating my earlier comment, I'm not sure we need to design this with increasing to over 128 subnets in mind at this stage, or if there's any reason to reduce the number of subnets after shipping PeerDAS.

I feel there might be some tension, but I just wanted to pass along our team's feedback and keep things moving since we talked about scheduling this for peerdas-devnet-3. I’m not saying the proposal is overly complex, just that it adds more complexity than not having it. But even compared to attestation committees, I’m not sure the benefits are clear right now. It feels less urgent compared to other priorities like validator custody or optimizing cell proof computation and bandwidth. That’s just my take - I just wanted to share our feedback. It’s not something I’m going to fight hard on.

ppopth · 2024-09-11T16:24:43Z

@jimmygchen Thank you for the feedback.

I'm not sure we need to design this with increasing to over 128 subnets in mind at this stage, or if there's any reason to reduce the number of subnets after shipping PeerDAS.

I think it's reasonable to make the spec agile to changes even if we don't know clearly yet if we want to change the number of subnets or not. Again, just like attestation subnets, the number of subnets and the number of committees have been both 64 for a long time and we don't see whether this will ever change.

That reason also applies here. The number of column subnets will probably be equal to the number of custody groups forever, but it has value in the view of agility.

What do you think?

Currently we use subnets as a unit of custody in the PeerDAS core protocol because it doesn't make sense to partially custody only some columns in the subnets and waste the bandwidth to download the columns the node doesn't custody. Since subnets correspond to GossipSub topics which are in a layer lower than the core protocol, using subnets as a unit of custody makes the core layer and the network layer too coupled to each other and leave no room for the network layer flexibility. This commit introduces "custody groups" which are used a unit of custody instead of subnets. The immediate benefit of the decoupling is that we can immediately increase the number of subnets without affecting the expected number of peers to cover all columns and affecting the network stability and without touching the core protocol. The reason we want to increase the number of subnets to match the number of columns is that the columns will be propagated through the network faster when they have their own subnets. Just like EIP-4844, each blob has its own subnet because, if all the blobs are in a single subnet, the blobs will be propagated more slowly. Since we keep the number of custody groups the same as the previous number of subnets (32), the expected number of peers you need to cover all the columns is not changed. In fact, you need only NUMBER_OF_COLUMNS and NUMBER_OF_CUSTODY_GROUPS to analyze the expected number, which makes the core protocol completely decoupled from the network layer.

jimmygchen · 2024-10-04T05:38:30Z

specs/_features/eip7594/das-core.md


-To start with a simple, stable backbone, for now, we don't shuffle the subnet assignments via the deterministic custody selection helper `get_custody_columns`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs.
+No, the number of subnets doesn't really matter. What matters to the network stability is the number of nodes and the churn rate in the network. If the number of the nodes is too low, it's likely to have a network partition when some nodes are down. For the churn rate, if the churn rate is high, we even need to have a higher number of nodes, since nodes are likely to be turned off more often.


Having a higher subnet count may require us to bump up our target peer count, as there would be a higher probability that nodes have 0 peers in a given column subnet, and will have to do more frequent discovery.

@dknopik has done some interesting simulation on PeerDAS with 128 column subnets, and the LH-only network (without any supernodes) performs poorly with our current peer count (100), but much better when when target peers is increased to 300. With a small peer count, there's a higher chance that the node doesn't have peers across all column subnets, and fails to publish columns to the network. However, increasing peer count to 300 does increase the resource consumption in LH significantly, and our preference is to increase our peer gradually to ~120. I think there's some work we need to do to improve our peer selection logic for PeerDAS but thought it might be worth mentioning.

I'm wondering if we need to choose a column subnet count where configuring CUSTODY_REQUIREMENT to a lower number won't cause the network to fall apart, with the existing target peer count?

With this, I'm convinced that decoupling network and das-core do seems like a useful thing - because without decoupling, changing any of the parameters would have impact to BOTH security and networking. If we decouple networking and das-core, we could determine an optimal networking parameters and test different custody parameters, and vice versa.

Having a higher subnet count may require us to bump up our target peer count, as there would be a higher probability that nodes have 0 peers in a given column subnet, and will have to do more frequent discovery.

According to this PR, the target peer count depends on the number of custody groups, right? Yes, before this PR it depends on the number of subnets, but this PR changes the structure and the terminology. WDYT?

ppopth · 2024-10-29T10:50:44Z

It seems that everyone in this PR is already convinced that this PR is useful. So it means it's ready to be merged?

ppopth force-pushed the custody-group branch from 629931b to 5e91577 Compare July 3, 2024 08:29

ppopth mentioned this pull request Jul 3, 2024

PeerDAS Breakout Room #2 ethereum/pm#1070

Closed

hwwhww added the EIP-7594 PeerDAS label Jul 5, 2024

ppopth mentioned this pull request Jul 8, 2024

PeerDAS Breakout Room #3 ethereum/pm#1093

Closed

jtraglia reviewed Jul 16, 2024

View reviewed changes

nisdas approved these changes Aug 6, 2024

View reviewed changes

g11tech approved these changes Aug 6, 2024

View reviewed changes

hwwhww mentioned this pull request Aug 9, 2024

peerdas-devnet-3 targets #3876

Open

3 tasks

ppopth force-pushed the custody-group branch from 4c8aee5 to afbe001 Compare August 14, 2024 08:41

jtraglia approved these changes Aug 14, 2024

View reviewed changes

specs/_features/eip7594/das-core.md Outdated Show resolved Hide resolved

specs/_features/eip7594/p2p-interface.md Outdated Show resolved Hide resolved

specs/_features/eip7594/das-core.md Outdated Show resolved Hide resolved

specs/_features/eip7594/das-core.md Outdated Show resolved Hide resolved

hwwhww added the DO NOT MERGE label Aug 14, 2024

nisdas mentioned this pull request Aug 15, 2024

Implement PeerDAS prysmaticlabs/prysm#14129

Draft

37 tasks

ppopth force-pushed the custody-group branch 3 times, most recently from 85bdcf6 to 1dd29cc Compare October 3, 2024 17:36

ppopth force-pushed the custody-group branch from 1dd29cc to f9e3440 Compare October 3, 2024 17:59

ppopth force-pushed the custody-group branch from f9e3440 to cbc4b55 Compare October 4, 2024 04:20

jimmygchen reviewed Oct 4, 2024

View reviewed changes

ppopth mentioned this pull request Oct 23, 2024

PeerDAS Breakout Room #11 ethereum/pm#1183

Closed

ppopth mentioned this pull request Nov 16, 2024

PeerDAS Breakout Room #12 ethereum/pm#1193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EIP-7594: Decouple network subnets from das-core #3832

EIP-7594: Decouple network subnets from das-core #3832

ppopth commented Jul 3, 2024 •

edited

Loading

djrtwo commented Jul 3, 2024

Nashatyrev commented Jul 4, 2024

cskiraly commented Jul 9, 2024

cskiraly commented Jul 9, 2024

ppopth commented Jul 9, 2024 •

edited

Loading

ppopth commented Jul 9, 2024

ppopth commented Jul 16, 2024

zilm13 commented Jul 18, 2024 •

edited

Loading

ppopth commented Jul 25, 2024 •

edited

Loading

zilm13 commented Jul 30, 2024

g11tech left a comment

ppopth commented Aug 14, 2024

jimmygchen commented Aug 21, 2024

ppopth commented Sep 10, 2024 •

edited

Loading

jimmygchen commented Sep 11, 2024

ppopth commented Sep 11, 2024

jimmygchen Oct 4, 2024

ppopth Oct 17, 2024

ppopth commented Oct 29, 2024


		To start with a simple, stable backbone, for now, we don't shuffle the subnet assignments via the deterministic custody selection helper `get_custody_columns`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs.
		No, the number of subnets doesn't really matter. What matters to the network stability is the number of nodes and the churn rate in the network. If the number of the nodes is too low, it's likely to have a network partition when some nodes are down. For the churn rate, if the churn rate is high, we even need to have a higher number of nodes, since nodes are likely to be turned off more often.

EIP-7594: Decouple network subnets from das-core #3832

Are you sure you want to change the base?

EIP-7594: Decouple network subnets from das-core #3832

Conversation

ppopth commented Jul 3, 2024 • edited Loading

djrtwo commented Jul 3, 2024

Nashatyrev commented Jul 4, 2024

cskiraly commented Jul 9, 2024

cskiraly commented Jul 9, 2024

ppopth commented Jul 9, 2024 • edited Loading

ppopth commented Jul 9, 2024

ppopth commented Jul 16, 2024

zilm13 commented Jul 18, 2024 • edited Loading

ppopth commented Jul 25, 2024 • edited Loading

zilm13 commented Jul 30, 2024

g11tech left a comment

Choose a reason for hiding this comment

ppopth commented Aug 14, 2024

jimmygchen commented Aug 21, 2024

ppopth commented Sep 10, 2024 • edited Loading

jimmygchen commented Sep 11, 2024

ppopth commented Sep 11, 2024

jimmygchen Oct 4, 2024

Choose a reason for hiding this comment

ppopth Oct 17, 2024

Choose a reason for hiding this comment

ppopth commented Oct 29, 2024

ppopth commented Jul 3, 2024 •

edited

Loading

ppopth commented Jul 9, 2024 •

edited

Loading

zilm13 commented Jul 18, 2024 •

edited

Loading

ppopth commented Jul 25, 2024 •

edited

Loading

ppopth commented Sep 10, 2024 •

edited

Loading