-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bandwidth_scheduler) - generate bandwidth requests based on receipts in outgoing buffers #12375
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12375 +/- ##
========================================
Coverage 71.62% 71.63%
========================================
Files 842 843 +1
Lines 170738 171070 +332
Branches 170738 171070 +332
========================================
+ Hits 122299 122551 +252
- Misses 43058 43119 +61
- Partials 5381 5400 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
cc @shreyan-gupta - Can you double check that we can easily do resharding no the metadata please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a high level first iteration while I'm warming up to the idea
/// waiting for the receipt sizes to become available. | ||
/// The resulting behaviour is similar to the previous approach with allowed shard. | ||
pub fn make_basic( | ||
to_shard: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u8 seems a bit slim for shard id. It seems quite feasible that we get to shard id 256 within a few years. By comparison in congestion control we use u16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to save space, since the requests are in chunk header and we could have num_shards**2 of them in the worst case. Switching to u16 when the time comes shouldn't be too hard, BandwidthRequests
is a versioned struct.
It would be nice to have something like "compact int representation" in borsh, e.g https://reference.cash/protocol/formats/variable-length-integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright but I'm still not a fan, especially in light of dynamic resharding in the future where the shard ids will be out of our control. Can you add a test like this one - test_shard_id_u16_optimization
- but for u8?
OutgoingBufferReceiptSizes { | ||
to_shard: ShardId, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You keep the entire structure in a single (per shard) trie value. This means that every access will need to retrieve and parse the entire thing adding to the state witness size. How large is it? If it is non-trivial I would suggest implementing a more fine grained storage like the indices
+ values
structure for storing receipts in state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
- The
BufferedReceipt
does not have the Outgoing prefix - I would apply the same convention here to make them more similar - Maybe replace Sizes with groups to make it a bit more future proof?
- How about
BufferedReceiptGroups
? Then we'll haveBufferedReceipt
,BufferedReceiptIndices
andBufferedReceiptGroups
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You keep the entire structure in a single (per shard) trie value. This means that every access will need to retrieve and parse the entire thing adding to the state witness size. How large is it? If it is non-trivial I would suggest implementing a more fine grained storage like the indices + values structure for storing receipts in state.
Yeah I'm not entirely sure about this design. It seemed like a simple and good enough option. Under normal circumstances the size of outgoing buffer won't be very large, congestion control will keep it in check. With 100MB of receipts in the outgoing buffer, the worst case size for this struct would be 4kB. (total_size / group_size_threshold * sizeof(u32) = (100 MB / 90kB * 4b)
But it could theoretically get large, with 90GB of receipts in the outgoing buffer its size could reach 4MB, which would be problematic. Do you think this is a real concern when congestion control tries to keep buffer size low?
The main issue with trie indices is that they would be more complex, but it's doable. Another issue is that we would be doing a lot of small reads - we need to read through the metadata at every height, up to 4.5MB of receipts, which would be about 50 trie reads with 90kB groups. Trie reads are a bit expensive, but maybe it's not that bad 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could have an optimized trie queue which keeps multiple groups in a single trie value, but that sounds even more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The framework for writing NEAR contracts has some neat wrappers for common datastructures that need to operate on top of trie, maybe we could add something similar in the internal runtime instead of implementing things manually each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if we read all of it every time it doesn't matter that much if it's a single or multiple nodes, if anything multiple makes it worse. Have you considered any other data structures for maintaining aggregate size info about receipts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered any other data structures for maintaining aggregate size info about receipts?
I couldn't find anything better than this. The problem is that we want to have sizes for some prefixes from 0 to n, and we want these sizes to increase by smallish amounts (~100kB). But if you have any cool ideas I'd love to explore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I guess you maybe could maintain something like an AVL tree, and walk down to subtrees until you hit some size, but that sounds super complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't come up with anything better. The groups seem like a good middle ground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of ideas I'd like to discuss in the meeting
/// A new receipt group is started when the size of the last group exceeds this threshold. | ||
/// All groups (except for the first and last one) have at least this size. | ||
pub group_size_threshold: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be part of this structure? I suppose you keep it here to support future migrations. Can you explain what are the alternatives? In particular I'm curious about the pros and cons of storing all groups in a single trie node vs storing each in a dedicated trie node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, but it was convenient to have it there for the methods that add and remove receipts. The migrations aren't really a concern, we can change the threshold any time we want, it is only used for the last group. The first group is removed when its size reaches zero, the threshold does't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually held in two places, here and in the OutgoingMetadatas
, it seems like a stretch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be in OutgoingMetadatas
because I have to pass it to the constructor of some shard's metadata when a new metadata is created. So there are at least two places - OutgoingMetadatas
and OutgoingBufferMetadata
where it needs to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I honestly don't get the design completely. Could you please point me to the design doc once again?
- What is the base requirement for sending a bandwidth request? What all pieces of information do we need?
- Is it just the size of the buffered receipts queue? In which case...
- Why are we not working with just storing one value (per shard ofc) in the CongestionInfo about the buffered receipt sizes?
- Why do we need this complex dequeue structure to be stored and updated as metadata?
- And why does this structure need to be part of trie and not something like CongestionInfo (in case it's small)?
@@ -183,6 +183,14 @@ impl Into<usize> for ShardId { | |||
} | |||
} | |||
|
|||
impl TryInto<u8> for ShardId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we converting ShardId to u8? This might be dangerous to implement as the shard_id could potentially grow beyond 256 in the far future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShardId is stored as u8
to save space in chunk headers, see #12375 (comment)
} | ||
} | ||
|
||
BandwidthRequest { to_shard, requested_values_bitmap: bitmap } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the BandwidthRequest
struct, could you please add a more detailed definition of how the requested_values_bitmap
bitmapping is interpreted. On taking a direct look, it's not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a comment on BandwidthRequestValues
, right below the bandwidth request struct.
/// There are this many predefined values of bandwidth that can be requested in a BandwidthRequest.
pub const BANDWIDTH_REQUEST_VALUES_NUM: usize = 40;
/// Values of bandwidth that can be requested in a bandwidth request.
/// When the nth bit is set in a request bitmap, it means that a shard is requesting the nth value from this list.
/// The list is sorted, from smallest to largest values.
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct BandwidthRequestValues {
pub values: [Bandwidth; BANDWIDTH_REQUEST_VALUES_NUM],
}
I could add a short comment which points to BandwidthRequestValues
and BandwidthRequestBitmap
@@ -178,6 +180,9 @@ pub enum TrieKey { | |||
index: u64, | |||
}, | |||
BandwidthSchedulerState, | |||
OutgoingBufferReceiptSizes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment above this describing what the key and value are, and what is the type of the value
@@ -337,6 +345,10 @@ impl TrieKey { | |||
buf.extend(&index.to_le_bytes()); | |||
} | |||
TrieKey::BandwidthSchedulerState => buf.push(col::BANDWIDTH_SCHEDULER_STATE), | |||
TrieKey::OutgoingBufferReceiptSizes { to_shard } => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this comment above, we can't directly use the number 16 here. I'll have to think more about how to structure the trie key. I think we made a grave mistake on trie key 15, BandwidthSchedulerState
// NOTE: NEW_COLUMN = 15 will be the last unique nibble in the trie!
// Consider demultiplexing on 15 and using 2-nibble prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad, that comment is wrong, the trie keys are typed on a full byte, not a nibble, we're good for another ~240 trie keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the comment is wrong, we serialize all TrieKeys
as a single byte (two nibbles). It was added by Jakob, I can remove it.
pub receipt: Cow<'a, Receipt>, | ||
pub metadata: StateStoredReceiptMetadata, | ||
/// Version of buffer metadata that was updated when the receipt was stored. | ||
pub queue_metadata_version: QueueMetadataVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get the need for this field. If we are already versioning StateStoredReceipt
, each new metadata version can increment StateStoredReceipt version or we can make StateStoredReceiptMetadata a versioned enum itself like Block and BlockHeaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to separate state stored receipt version from the metadata version. We might add more versions of state stored receipt without changing the metadata format, then we would have to have a function like should_update_metadata_v0_for_this_version_of_stored_receipt
. Adding a field makes this more explicit, we don't have to match versions of state stored receipt with versions of the metadata.
If we wanted to update the metadata, e.g include congestion gas in the metadata, we could add a new metadata version, without duplicating the whole statestoredreceipt struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Shreyan here, it would be better to stick to the established versioning patterns. This field is only read by the queue_metadata_version()
method so you can just return the right one for given version of StateStoredReceipt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Initially I used the version of stored receipt, but then I thought that it could be clearer to mark the version of the metadata explicitly.But if you don't like it then I'll remove it.
/// The V1 of StateStoredReceipt. | ||
#[derive(BorshDeserialize, BorshSerialize, PartialEq, Eq, Debug, ProtocolSchema)] | ||
pub struct StateStoredReceiptV1<'a> { | ||
pub receipt: Cow<'a, Receipt>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wacban, maybe a good time to cleanup the Cow part of the receipt here which you've been meaning to do.
A bandwidth request means - "I have these receipts that I would like to send to the other shard, but sending them all might overload the receiver, can I be granted bandwidth to send (some of) them?".
The problem with having one value is that the bandwidth scheduler can't really grant partial bandwidth. When a shard wants to send one 4MB receipt, granting 2MB of bandwidth is useless, it won't be able to send the 4MB receipt with this grant. This forces the bandwidth scheduler to either give the shard all of the requested bandwidth (the total size of outgoing receipts), or nothing. Otherwise there could be a liveness issue - shards that want to send large receipts, but can't because they're receiving partial bandwidth grants, which aren't enough to send the larger receipt.
Generating the structure from scratch every time would require walking over all of the receipts in the outgoing buffer, and that would be slow. Because of that I want to have a structure which sits in the trie and is updated incrementally as receipts are buffered/unbuffered. |
|
||
/// Create a basic bandwidth request when receipt sizes are not available. | ||
/// It'll request two values - one corresponding to the first receipt in | ||
/// the outgoing buffer and one corresponding to max_receipt_size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me what is the max receipt size and what is the max request size? Is that 1.5MB and 4MB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_receipt_size = 4MiB
max_shard_bandwidth = 4.5MB
In the current bandwidth limits max_shard_bandwidth
is equal to 4.5MB + 100kB * num_shards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I understand all of it but I did read it and it makes sense. Added some comments. Let me think a bit about how this will work with resharding.
if values[i] == params.max_receipt_size { | ||
bitmap.set_bit(i, true); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a strict guarantee that one of the values will be the max_receipt size? Perhaps it would be a bit more robust to set the bit for the first value greater or equal to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's a guarantee. BandwidthRequestValues::new
generates values and then replaces the value closes to max_receipt_size
with max_receipt_size
.
It's needed because there's only a guarantee that:
base_bandwidth * num_shards + max_receipt_size <= max_shard_bandwidth
But there's no such guarantee for the first value that is larger than max_shard_bandwidth
.
I'll add an assert here, as it's an important invariant.
/// This ensures that the blockchain will be able to make progress while | ||
/// waiting for the receipt sizes to become available. | ||
/// The resulting behaviour is similar to the previous approach with allowed shard. | ||
pub fn make_basic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method called only when there is at least one receipt? (I think it should - just sanity checking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when there are no receipts generate_bandwidth_request
returns None
let Some(first_receipt) =
self.outgoing_buffers.get_first_receipt(to_shard, trie, side_effects)?
else {
// No receipts in the outgoing buffer
return Ok(None);
};
let first_receipt_size: u64 = | ||
receipt_size(&first_receipt).expect("First receipt size doesn't fit in u64"); | ||
|
||
let shard_u8: u8 = to_shard.try_into().expect("ShardId doesn't fit into u8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's more idiomatic to use the same variable name
let to_shard: u8 = to_shard.try_into()...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like shadowing variables :/
When I see a variable I look up to see where it's defined and what its type is. Redefining a variable causes confusion because the type might change somewhere inbetween.
pub receipt: Cow<'a, Receipt>, | ||
pub metadata: StateStoredReceiptMetadata, | ||
/// Version of buffer metadata that was updated when the receipt was stored. | ||
pub queue_metadata_version: QueueMetadataVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Shreyan here, it would be better to stick to the established versioning patterns. This field is only read by the queue_metadata_version()
method so you can just return the right one for given version of StateStoredReceipt.
@@ -178,6 +180,9 @@ pub enum TrieKey { | |||
index: u64, | |||
}, | |||
BandwidthSchedulerState, | |||
OutgoingBufferReceiptSizes { | |||
to_shard: ShardId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: receiving_shard - to keep it consistent with the BufferedReceipt
); | ||
|
||
first_group.group_size = | ||
first_group.group_size.checked_sub(receipt_size).unwrap_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I did .expect(format!(...))
, but clippy was complaining that I'm calling the format!
function every time, even when the expect doesn't get triggered.
let trie_receipt = | ||
if side_effects { get(trie, &receipt_key) } else { get_pure(trie, &receipt_key) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really easy to forget, we should make side_effects an argument to get instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @nagisa
pub fn get_first_receipt( | ||
&self, | ||
to_shard: ShardId, | ||
trie: &dyn TrieAccess, | ||
side_effects: bool, | ||
) -> Result<Option<ReceiptOrStateStoredReceipt>, StorageError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method actually remove the receipt from the buffer? Does it clone it or will it be read again later? Do we care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe call it peek_receipt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't remove anything. In Rust first()
and get()
are usually methods that access a collection without removing the item, so I thought that this would be similar.
I can change it to peek if it's confusing
if receipt.queue_metadata_version() == Some(QueueMetadataVersion::OutgoingBufferMetadataV0) | ||
{ | ||
self.outgoing_buffer_metadatas.on_receipt_buffered(shard, size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe make it into a match so that when new versions are added the compiler will remind us to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will do 👍
// it's always possible to grant enough bandwidth for one receipt. | ||
let mut receipt_group_sizes = metadata.iter_receipt_group_sizes(); | ||
let first_group_size = receipt_group_sizes.next().unwrap(); | ||
let receipt_sizes = [first_receipt_size, first_group_size - first_receipt_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's a bit strange that the bandwidth request is made based on receipt sizes but you pass in the group sizes. Maybe make it be based on groups and just pass the first receipt size as an additional argument to the ctor? I'm not sure if this suggestion is any good, feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BandwidthRequest would ideally operate on receipts, but groups are an optimization that has to be done. But it's a bit unexact, I'll add a TODO and revisit it later, there's enough going on in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find a better name, but it's pretty hard. Bandwidth request can be created from receipts or receipt groups, or prefix sums of receipt sizes. receipt_sizes
is generic enough to cover all these cases.
If you have an idea for a better name, I can change it.
I tried out how converting the receipt groups to a TrieQueue would look like, you can check it out in this PR: #12410 I have mixed feelings. It's nice that it's rollback-friendly and there's no risk of having a super large value, but the code got way more complex and ugly :/ I need to think about it. |
Closing in favor of #12464 |
…ipts in outgoing buffers v2.0 (#12464) This is a rewrite of #12375, following the redesign that we discussed offline. ### Main changes * Use a `TrieQueue` instead of a single value with all of the receipt groups * `ReceiptGroup` now contains both size and gas * Keep information about total gas, size and number of receipts for every outgoing buffer. Useful for resharding. * Don't read the first receipt when generating bandwidth requests. This doesn't play well with witness size limits, reading the first receipt could potentially add 4MB of storage proof, and we need to do it for every outgoing buffer. * Groups now have an upper size bound to ensure that size of the first group is at most `max_receipt_size` (as long as the size of a single receipt is at most `max_receipt_size`...) * Use the version of `StateStoredReceipt` to determine if the metadata should be updated. Don't add fields which keep the version of metadata. --- The code is ready, but untested. I need to write a bunch of tests. You can take a look at the code in the meantime, it'll be easier to apply any changes before all the tests are written. The PR is meant to be reviewed commit-by-commit. --------- Co-authored-by: Shreyan Gupta <[email protected]>
To generate a bandwidth request we have to take a look at the outgoing buffer to some shard and request values that correspond to the sizes of receipts stored in the buffer. Ideally we would walk over all the individual receipts, but that would be pretty slow. Instead of that let's keep coarse metadata about receipts in each outgoing buffer and walk over the metadata instead of individual receipts. The metadata is much smaller than the whole buffer, so walking over it will be much faster.
The metadata looks like this:
Consecutive receipts are grouped into groups. A new group is started when the size of the last group reaches 90kB (
group_size_threshold
). The first/last group in the list is updated when the first/last receipt is popped/added.To generate a bandwidth request we walk over the sizes of the groups, not individual receipts. It's not as exact as individual receipts would be, but it should be good enough. Distance between
BandwidthRequestValues
is about 111kB, so having 90kB granularity should be fine.The list of receipt groups for every buffer is stored as a single trie value. A trie value can be at most 4MB, so we have space for 1 million receipt groups. We'll never reach this size under normal circumstances, it would require us to have 90 GB of receipts stored in the outgoing buffer.
Implementing the metadata was a bit troublesome. I think ideally we would have an
OutgoingBuffer
struct which keeps all information about the receipts stored in the buffer, including the metadata. It'd have a safe interface (push/pop/iter) that ensures that metadata is always updated when a receipt is popped or pushed. I tried to do something like this, but the existing code strongly resists. CurrentlyOutgoingReceiptBuffer
is a &mut reference toShardsOutgoingReceiptBuffer
, theOutgoingReceiptBuffer
doesn't keep any state by itself. This makes it hard to store some information there. The &mut reference also makes it difficult to do const operations that don't require mut, like reading the list of receipts and generating a bandwidth request. There's also thepop_n
function which directly manipulates queue indices, which doesn't allow to update metadata.In the end I gave up on the idea and decided to store the buffer metadatas directly in
ReceiptSink
. They're updated every time a receipt is buffered or removed, just like congestion info.I had to refactor
ReceiptSink
a bit - I wanted to pass it to the function that finalizes chunk application, but it borrowedoutgoing_receipts
andown_congestion_info
, which made it impossible. I changed it so thatReceiptSink
owns these values, and now it can be passed to the function.The metadatas work a bit differently from the
TrieQueue
-TrieQueue
saves the updated indices to the trie after every update, but the metadata could be a bit larger, so I didn't want to do that. Instead the metadata is saved at the end of chunk application. AFAIU this should be fine, that's how we do it for promise/yield as well.Protocol upgrade is handled using the
StateStoredReceipt
enum. I added a new version which keeps information about the version of metadata that was updated when this receipt was stored. Thanks to this we know if we should update the metadata when a receipt is removed from the buffer. We can also check if there are any receipts not included in the metadata by looking at the first receipt in the buffer.@shreyan-gupta the metadata could be used to get the total size of receipts in outgoing buffer without iterating through all of the receipts, which I heard is needed for resharding. I could make the metadata into a separate feature and release it before bandwidth scheduler.
The change is untested for now, I'd like to get some feedback before writing all the tests.
The PR is split into commits for easier review.