Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0207: Raise Block Limits to 50M #207

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

apfitzge
Copy link
Contributor

@apfitzge apfitzge commented Dec 5, 2024

No description provided.

@apfitzge apfitzge force-pushed the raising_block_limits branch 2 times, most recently from abd0b2d to 6ed73ab Compare December 5, 2024 17:39
- Double limit to 96M CUs
- Viewed as too aggressive at this time, and may cause unforeseen issues
particularly in infrastructure supporting the network users.
- We instead plan to increase the limits incrementally as the network
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a more concrete idea of what this looks like? I think doing a small increase first is prudent, but I think the increases proceeding that should be non-linear and increasing in %

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine a subsequent SIMD will introduce a ladder of feature gates. If we used linear and a 5M step (pulling this number out of the air for now), then I think we'd have individual gates for each of these values:

[55M, 60M, 65M, 70M, ..., 100M]

but I think the increases proceeding that should be non-linear and increasing in %

I haven't given too much thought to this aspect. If we land on % (and say 10%), then the ladder would look like this if we round to 1M-multiples:

[55M, 61M, 67M, 73M, 81M, 89M, 97M]

Assuming this answers your general question, I think we can kick the can on specifics of the ladder to the follow-on SIMD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's gonna depend how things go, and how quickly we can improve performance.
To the best of my knowledge, we have no plans for some automatic increasing limit at this time; any further changes will go through same SIMD process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the feature gate actual required? It would be nice to have automatic progression and so long as you use a static calculation based off of numbers already under consensus (e.g. Epoch or Slot) the validators should be able to easily calculate the current block limit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature gate gives us the ability to raise the limit at a time when we feel the cluster is healthy and can handle the additional load that the higher limit could introduce. Adding in "raise limit at epoch N" adds unknowns; what if the cluster is struggling when epoch N comes up or what if the entire validator set hasn't adopted a new enough version by the time epoch N rolls around.

It would be nice to have automatic progression

This is somewhat similar to what 7layermagik mentioned here. But to quote Andrew from here, we are viewing this as a "foot in the door" feature. Implementing this SIMD will establish the logic to raise the limit which will enable future SIMDs to raise the limit to be easier

increases.
- Double limit to 96M CUs
- Viewed as too aggressive at this time, and may cause unforeseen issues
particularly in infrastructure supporting the network users.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which infrastructure specifically? are we referring to RPCs?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, RPC and related technologies (ie BigTable) definitely included here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unforeseen

RPCs, indexers, historical transaction data providers come to mind, but idk what's out there. Hoping people will either alert us or fix their system if increased limits are coming close to breaking their stuff.

@apfitzge apfitzge force-pushed the raising_block_limits branch from 6ed73ab to e6d4205 Compare December 5, 2024 18:09
@7layermagik
Copy link

7layermagik commented Dec 5, 2024

How about making it so the block limit cu increases are contingent on cluster vote latency being within a good enough range?

You can propose a doubling in block limit to 96mil cu and then increase the cu limit linearly each epoch (1 or 2mil cu's for example). Each epoch you can increment it and if the 80 percentile (could be something else) by stake vote latency is too slow that indicates it shouldn't be increased the next epoch until the cluster performance is good enough. For example, say right now 1.6 slot vote latency is our 80th percentile, we keep increasing cu's each epoch but once it goes over say 1.8 slot vote latency it stops increasing until latency drops again. There should also be a criteria that blocks are at least 80 or 90% full or so on avg during the epoch because we don't want to be comparing vote latencies between two epochs where the avg traffic is a lot different and blocks aren't even close to max (because vote latency will go up once they are fuller so that's risky to base assumptions on not very full blocks). Anyway, just some quick ideas.

@mertimus
Copy link

mertimus commented Dec 5, 2024

How about making it so the block limit cu increases are contingent on cluster vote latency being within a good enough range?

You can propose a doubling in block limit to 96mil cu and then increase the cu limit linearly each epoch (1 or 2mil cu's for example). Each epoch you can increment it and if the 80 percentile (could be something else) by stake vote latency is too slow that indicates it shouldn't be increased the next epoch until the cluster performance is good enough. For example, say right now 1.6 slot vote latency is our 80th percentile, we keep increasing cu's each epoch but once it goes over say 1.8 slot vote latency it stops increasing until latency drops again. There should also be a criteria that blocks are at least 80 or 90% full or so on avg during the epoch because we don't want to be comparing vote latencies between two epochs where the avg traffic is a lot different and blocks aren't even close to max (because vote latency will go up once they are fuller so that's risky to base assumptions on not very full blocks). Anyway, just some quick ideas.

that seems like something for a next SIMD after seeing how this one goes

@apfitzge apfitzge marked this pull request as ready for review December 5, 2024 18:57
@7layermagik
Copy link

7layermagik commented Dec 5, 2024

How about making it so the block limit cu increases are contingent on cluster vote latency being within a good enough range?

That'll add many months though. The SIMD process seems to take forever -- might as well make the proposal a more thorough one. 2mil more cu's isn't going to stress things much at all either I imagine -- idk what value there is increasing it only that much

@apfitzge
Copy link
Contributor Author

apfitzge commented Dec 5, 2024

That'll add many months though. The SIMD process seems to take forever -- might as well make the proposal a more thorough one. 2mil more cu's isn't going to stress things much at all either I imagine -- idk what value there is increasing it only that much

Goal with this is kind of a "foot in the door" from my perspective.

With this simd, we set up system so we can easily add features for block-limits. I'm hoping to backport this.
Future SIMDs adjusting limits can just be backported and activated more quickly. Regular procedure of taking eons to activate be damned.

@Benhawkins18 Benhawkins18 changed the title Raise Block Limits to 50M SIMD-0207: Raise Block Limits to 50M Dec 9, 2024
Copy link
Contributor

@ripatel-fd ripatel-fd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on behalf of Firedancer even if the increase is merely symbolic. 48M CUs is far too low and holding back performance of potentially faster clients.

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by Anza

@Benhawkins18 Benhawkins18 self-requested a review December 10, 2024 15:34
Copy link
Collaborator

@Benhawkins18 Benhawkins18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I see approvals from Anza and FD. Moving this into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.