-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: collect and cache builtin instructions cost and count per transaction #2692
feat: collect and cache builtin instructions cost and count per transaction #2692
Conversation
rename compute_budget_instruction_details to instruction_details as it contains more than just compute-budget ix info;
The three commits are organized for bench incremental steps: original, simply add needed function, perf optimized: Commit 1, bench before change
commit 2: add function to collect builtin cost; It adds hashing for every program_id; results: 0-ix regress a bit due to added math to calc non-cb-ix-count, but hashing made 4-ix benches worse, and much worse for many-ix benches
Commit 3: updated filter to cached resolved builtin ix cost. It adds additional cost of allocating larger array per tx, but removed all repeated hashing; results: 0-ix bench regressed, 4-ix bench has small changes, many-ix benches significantly improved
|
runtime-transaction/benches/process_compute_budget_instructions.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, small preference on the nested-enum choice
// None - un-checked | ||
// Some<None> - checked, not builtin | ||
// Some<Some<(bool, u32)>> - checked, is builtin and (is-compute-budget, default-cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me this would better represented by a new enum:
#[derive(Default)]
enum BuiltinCheckStatus {
#[default]
Unchecked,
NotBuiltin,
Builtin{
is_compute_budget: bool,
default_cost: u32,
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think this saves a byte as well since we don't need 2 option discriminants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum is a way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
savings cross all benches, due to smaller memory footprint
Running benches/process_compute_budget_instructions.rs (target/release/deps/process_compute_budget_instructions-b53a7f97abe58117)
bench_process_compute_budget_instructions_empty/0 instructions
time: [11.957 µs 11.975 µs 11.995 µs]
thrpt: [85.368 Melem/s 85.513 Melem/s 85.641 Melem/s]
change:
time: [-29.911% -29.703% -29.488%] (p = 0.00 < 0.05)
thrpt: [+41.820% +42.253% +42.676%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
bench_process_compute_budget_instructions_no_builtins/4 dummy Instructions
time: [18.924 µs 18.941 µs 18.959 µs]
thrpt: [54.011 Melem/s 54.062 Melem/s 54.112 Melem/s]
change:
time: [-15.416% -15.210% -15.005%] (p = 0.00 < 0.05)
thrpt: [+17.654% +17.938% +18.225%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
bench_process_compute_budget_instructions_compute_budgets/4 compute-budget instructions
time: [37.252 µs 37.332 µs 37.424 µs]
thrpt: [27.362 Melem/s 27.430 Melem/s 27.488 Melem/s]
change:
time: [-21.462% -21.177% -20.898%] (p = 0.00 < 0.05)
thrpt: [+26.419% +26.867% +27.328%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
bench_process_compute_budget_instructions_builtins/4 dummy builtins
time: [41.314 µs 41.521 µs 41.730 µs]
thrpt: [24.539 Melem/s 24.662 Melem/s 24.786 Melem/s]
change:
time: [-8.4225% -8.0835% -7.7755%] (p = 0.00 < 0.05)
thrpt: [+8.4311% +8.7944% +9.1972%]
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
18 (18.00%) high mild
bench_process_compute_budget_instructions_mixed/355 mixed instructions
time: [539.98 µs 540.43 µs 540.90 µs]
thrpt: [1.8931 Melem/s 1.8948 Melem/s 1.8964 Melem/s]
change:
time: [-2.9610% -2.8023% -2.6445%] (p = 0.00 < 0.05)
thrpt: [+2.7164% +2.8831% +3.0513%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
bench_process_compute_budget_and_transfer_only/355 transfer instructions and compute budget ixs
time: [607.07 µs 607.59 µs 608.19 µs]
thrpt: [1.6837 Melem/s 1.6853 Melem/s 1.6868 Melem/s]
change:
time: [-4.9817% -4.8051% -4.6671%] (p = 0.00 < 0.05)
thrpt: [+4.8956% +5.0477% +5.2429%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
Implementation with the aux cache looks much better than what you had before! But shouldn't this code be behind a feature gate and shouldn't we at least have a SIMD written up describing the intended feature gated change in behavior? |
This doesn't change the behavior yet though, right? It's caching this data because we plan to use it, but the compute-budget-details we get from the |
Yes. No change in this PR, it just adds "collect and cache" function. The follow-up PR is going to use cached builtin cost, which will change the behavior, It's feature gate: #2562 I didn't create a SIMD because this feature gate isn't to change protocol, but to fix a bug; the bug being "compute budget allocates 200K per builtin, yet only consume its default cost; except for compute-budget instructions, that it does not allocate units but still consume its default cost". |
There's a non-zero perf hit, which I see as a behavior change. There's no reason to cache this data when the feature isn't enabled right? But if it's too difficult to put this new caching behind a feature gate, maybe it's fine to keep it as is.
This is a protocol change to fix a bug. If firedancer isn't aware of this protocol change they could process transactions differently. Imagine a transaction doesn't set a compute limit but relies on the fact that adding a few builtin instructions to their transaction will increase their tx compute limit which is used fully by an invocation to a custom program. The transaction would succeed before the feature gate and would fail after the feature gate leading to a divergence if all clients aren't in sync for implementation. Given that this feature needs coordination between client teams and that it could break downstream users, I think we should have a SIMD to discuss. |
Synced with FD previously, FD planned to rebase cost model implementation after this fix is in (they are currently using agave runtime, and its cost model implementation, but do not handle adjust-up, so over packing in some cases). Just chatted with Philip, considering currently schedule, it seems it makes better sense to do all that after breakpoint. In this case, a SIMD would be very helpful to document the change, and perhaps discuss other possible solutions. I'll open one then link to feature gate issue #2562. As for this PR, wdyt to merge it if no other open issues itself? |
I really don't think it makes sense to merge yet, what's the rush? |
I have few PRs after this, but I can reorg my pipeline. Let's keep this open while SIMD solana-foundation/solana-improvement-documents#170 being discussed. |
I think rather than waiting on the SIMD process, which could take a while, how about we split this up? Keep the old version such that we can get the compute budget without the builtin-check overhead, but also this new version which has the builtins checked. Ideally we'd have a cost-model fn that we could pass this new struct into (with feature_set) so we can get the cost without doing separate scans for compute-budget AND builtins - having that would also help the transition to new tx type and runtime-transaction. @jstarry @tao-stones does that seem reasonable to you? |
#3799 did it |
Problem
#2561
Summary of Changes
compute_budget_
frominstruction_details
name to reflect that struct caches more than just compute-budget detailsBUILTIN_INSTRUCTION_COSTS
Fixes #2561