Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate ShardLayout::v0 #12176

Open
Longarithm opened this issue Oct 1, 2024 · 7 comments
Open

Deprecate ShardLayout::v0 #12176

Longarithm opened this issue Oct 1, 2024 · 7 comments
Labels
A-resharding Area: State resharding C-good-first-issue Category: issues that are self-contained and easy for newcomers to work on. C-housekeeping Category: Refactoring, cleanups, code quality

Comments

@Longarithm
Copy link
Member

Longarithm commented Oct 1, 2024

Description

ShardLayout::v0 is responsible for random assignment of accounts to shards. It is incompatible with current and future shard layouts because it makes it impossible to derive range of accounts from boundary accounts.
The first mainnet config was v0. However, there was only one shard, so random assignment wasn't used.
It could be nice to remove all occurrences of v0 from code, including all kinds of tests up to pytests.
But we need to be very careful to preserve compatibility with genesis.

There was assumption in some test that localnet accounts are located on different shards; but this can be handled with v1/v2 as well.

@Longarithm Longarithm added C-housekeeping Category: Refactoring, cleanups, code quality C-good-first-issue Category: issues that are self-contained and easy for newcomers to work on. A-resharding Area: State resharding labels Oct 1, 2024
@eagr
Copy link
Contributor

eagr commented Oct 18, 2024

I could only deduce the work to be done from your description here as I’m brand new to NEAR.


remove all occurrences of v0 from code

By "deprecate", I suppose you mean to wipe it out instead of to show a deprecation warning about using V0, right?

And I guess some of the occurrences of V0 in the tests should be replaced by V2 (assuming V2 is preferred over V1 now, that right?).

preserve compatibility with genesis

Do you mean something like "new code should work on old data"?
If so, to be backward-compatible, I suppose we need to do some transformation if it’s v0, providing the related code is removed?
Could you expand on that? Not so sure about this.

@wacban
Copy link
Contributor

wacban commented Oct 24, 2024

@eagr

Let's first disambiguate:

  • V0 is an old variant of that ShardLayout structure - this is a protocol structure so it likely needs to stay as is in order to preserve serialization and replayability.
  • v0 is a method used to create shard layouts using the V0 structure. This method should be removed and replaced with calls to v2.
  • I think mainnet genesis contains a V0 layout but it's constructed using deseralization when genesis is read so the v0 method shouldn't be needed.

I can also see that there is v0_single_shard -> perhaps a good first step for you would be to

  • rename it to single_shard
  • make it use v2 instead of v0
  • run tests - I would hope they all pass but if not -> let's discuss

Then later you can find any reference to v0 and try tro replace those with v2.

  • This has potential to break things like the implicit assumptions made by some tests that Longarithm mentioned.
  • Feel free to debug but if it doesn't work it's ok to leave some as is or ping me and I'll help debugging.

Finally once all references to v0 are gone we can delete it altogether.

Feel free to pick and choose any of the subtasks here.

preserve compatibility with genesis

I think it should be sufficient to keep V0 as is.

@eagr
Copy link
Contributor

eagr commented Oct 27, 2024

/// Returns the shard index for a given shard id. The shard index should be
/// used when indexing into an array of chunk data.
pub fn get_shard_index(&self, shard_id: ShardId) -> ShardIndex {
match self {
// In V0 the shard id and shard index are the same.
Self::V0(_) => shard_id.into(),
// In V1 the shard id and shard index are the same.
Self::V1(_) => shard_id.into(),
// In V2 the shard id and shard index are **not** the same.
Self::V2(v2) => v2.id_to_index_map[&shard_id],
}
}
/// Get the shard id for a given shard index. The shard id should be used to
/// identify the shard and starting from the ShardLayoutV2 it is unique.
pub fn get_shard_id(&self, shard_index: ShardIndex) -> ShardId {
match self {
Self::V0(_) => ShardId::new(shard_index as u64),
Self::V1(_) => ShardId::new(shard_index as u64),
Self::V2(v2) => v2.index_to_id_map[&shard_index],
}
}

These would panic when given an invalid id/index in the case of V2. Does it mean a deliberately crafted block could crash a validator? Panicking here doesn't feel like desirable behavior anyways.


Off the topic, curious what are the reasons for running checks against nightly in CI? The only reason I could think of is to be a good citizen by discovering early and reporting issues back to the Rust community. But that doesn't justify doing it on a per PR basis as it might block development if there're issues on nightly.

@wacban
Copy link
Contributor

wacban commented Oct 27, 2024

These would panic when given an invalid id/index in the case of V2. Does it mean a deliberately crafted block could crash a validator? Panicking here doesn't feel like expected behavior anyways.

Thanks, that is a great catch. I will look into it. Generally speaking the shard id is validated very early on in the block processing pipeline and it is handled correctly there as far as I can tell (we do have a test for this). Still it's a terrible pattern that should be improved.

Off the topic, curious what are the reasons for running checks against nightly in CI? The only reason I could think of is to be a good citizen by discovering early and reporting issues back to the Rust community. But that doesn't justify doing it on a per PR basis as it might block development if there're issues on nightly.

Just to be on the same page, there is rust nightly and nearcore nightly. I think you're talking about rust nightly and the CI is about nearcore nightly. As far as I can tell we don't run rust nightly anywhere. The nearcore nightly contains features to be released in the future but that aren't quite ready. We do want to run CI on those.

@eagr
Copy link
Contributor

eagr commented Oct 28, 2024

I think you're talking about rust nightly and the CI is about nearcore nightly. As far as I can tell we don't run rust nightly anywhere.

Oh then never mind. :) I just glanced over the logs and presumed it's for Rust nightly. Should have double checked before leaving the comment, my bad.

we do have a test for this

image

Do you mean these tests? The first 1 is good, invalid shard id seems properly handled. The latter 2 kinda break with V2.

@wacban
Copy link
Contributor

wacban commented Oct 28, 2024

Do you mean these tests? The first 1 is good, invalid shard id seems properly handled. The latter 2 kinda break with V2.

Those and there should be one in block_corruption.rs as well.

Yeah V2 and non-contiguous ShardIds are still work in progress. I will be migrating more tests to use V2 in the coming days as part of the resharding V3 project. If you are curious you can have a look at #11881 and NEP-568.

btw I always thought it would be cool to implement a method to create a new shard layout from the previous one by adding a new boundary account. If you are interested I can create an issue for you.

@eagr
Copy link
Contributor

eagr commented Oct 28, 2024

btw I always thought it would be cool to implement a method to create a new shard layout from the previous one by adding a new boundary account. If you are interested I can create an issue for you.

sure please do :)

github-merge-queue bot pushed a commit that referenced this issue Nov 2, 2024
helps #12176

---------

Co-authored-by: wacban <[email protected]>
Co-authored-by: Waclaw Banasik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-resharding Area: State resharding C-good-first-issue Category: issues that are self-contained and easy for newcomers to work on. C-housekeeping Category: Refactoring, cleanups, code quality
Projects
None yet
Development

No branches or pull requests

3 participants