Test add stake program benches #3396

tao-stones · 2024-10-30T23:37:05Z

Problem

Part of #3364, to benchmark each stake instructions to determine their static CU consumption.

Summary of Changes

add benches for each instruction, aiming to run through their happy-path.

tao-stones

@KirillLykov @ksolana This is the first stab of benching for all stake instructions. (Well, there is one instruction that is deprecated, that one is not covered.) Just putting it out, so you can take a peek before our next sync-up. It approves feasibility, but I'd love to hear your thoughts on if it's scalable to other programs, or if the structure is suitable for long term. @ksolana mentioned picking right bench harness, which is good point.

The result from my laptop:

     Running benches/stake.rs (target/release/deps/stake-4b9e51079ccdcc7c)

running 21 tests
test bench_authorize_staker                       ... bench:       8,069.97 ns/iter (+/- 136.14)
test bench_authorize_staker_checked               ... bench:       8,131.99 ns/iter (+/- 200.29)
test bench_authorize_staker_checked_with_seed     ... bench:       8,565.11 ns/iter (+/- 179.96)
test bench_authorize_staker_with_seed             ... bench:       8,678.42 ns/iter (+/- 128.38)
test bench_authorize_withdrawer                   ... bench:       8,391.94 ns/iter (+/- 148.35)
test bench_authorize_withdrawer_checked           ... bench:       8,350.21 ns/iter (+/- 115.06)
test bench_authorize_withdrawer_checked_with_seed ... bench:       8,807.82 ns/iter (+/- 197.97)
test bench_authorize_withdrawer_with_seed         ... bench:       8,774.39 ns/iter (+/- 632.68)
test bench_deactivate                             ... bench:       8,461.04 ns/iter (+/- 545.88)
test bench_deactivate_delinquent                  ... bench:      12,779.42 ns/iter (+/- 702.53)
test bench_delegate_stake                         ... bench:      11,087.73 ns/iter (+/- 132.47)
test bench_get_minimum_delegation                 ... bench:       7,873.55 ns/iter (+/- 84.88)
test bench_initialize                             ... bench:       8,354.89 ns/iter (+/- 140.99)
test bench_initialize_checked                     ... bench:       8,241.75 ns/iter (+/- 74.91)
test bench_merge                                  ... bench:       9,105.14 ns/iter (+/- 174.68)
test bench_move_lamports                          ... bench:       9,267.85 ns/iter (+/- 187.89)
test bench_move_stake                             ... bench:       9,861.28 ns/iter (+/- 282.50)
test bench_set_lockup                             ... bench:       8,481.72 ns/iter (+/- 523.87)
test bench_set_lockup_checked                     ... bench:       8,490.35 ns/iter (+/- 288.38)
test bench_split                                  ... bench:       8,864.95 ns/iter (+/- 379.75)
test bench_withdraw                               ... bench:       8,764.83 ns/iter (+/- 308.53)

test result: ok. 0 passed; 0 failed; 0 ignored; 21 measured; 0 filtered out; finished in 38.32s

programs/stake/benches/stake_instructions.rs.bak

programs/stake/benches/stake.rs

tao-stones · 2024-11-01T14:11:27Z

commit 918456b tries out criterion-perf-events to measure hardware instruction count. Note: the dependent project perfcnt does not support Windows nor MacOS.

Output from dev linux box:

     Running benches/stake.rs (target/release/deps/stake-148e121e51543019)
Gnuplot not found, using plotters backend
initialize              time:   [125555.0505 cycles 125555.6675 cycles 125556.2403 cycles]
                        change: [-0.0019% -0.0003% +0.0013%] (p = 0.72 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

initialize_checked      time:   [124877.3175 cycles 124877.8532 cycles 124878.3885 cycles]
                        change: [-0.3875% -0.3862% -0.3847%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

authorize_staker        time:   [128913.4326 cycles 128914.0395 cycles 128914.6433 cycles]
                        change: [-0.9123% -0.9107% -0.9093%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe

authorize_withdrawer    time:   [129953.4734 cycles 129954.0277 cycles 129954.6122 cycles]
                        change: [+1.6046% +1.6059% +1.6072%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

authorize_staker_with_seed
                        time:   [132053.6950 cycles 132054.3052 cycles 132054.9480 cycles]
                        change: [+0.3472% +0.3486% +0.3500%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

authorize_withdrawer_with_seed
                        time:   [131453.1738 cycles 131453.8222 cycles 131454.4679 cycles]
                        change: [-0.1491% -0.1473% -0.1458%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

authorize_staker_checked
                        time:   [128292.3749 cycles 128292.9443 cycles 128293.5249 cycles]
                        change: [-0.0583% -0.0569% -0.0554%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

authorize_withdrawer_checked
                        time:   [127848.0434 cycles 127848.6122 cycles 127849.1551 cycles]
                        change: [+0.0103% +0.0115% +0.0126%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

authorize_staker_checked_with_seed
                        time:   [131262.7181 cycles 131263.3147 cycles 131263.9001 cycles]
                        change: [+0.1068% +0.1083% +0.1099%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

authorize_withdrawer_checked_with_seed
                        time:   [130896.0412 cycles 130896.6433 cycles 130897.2515 cycles]
                        change: [-1.0689% -1.0661% -1.0637%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

set_lockup              time:   [125666.0625 cycles 125666.4992 cycles 125666.9603 cycles]
                        change: [-0.5479% -0.5459% -0.5438%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

set_lockup_checked      time:   [125713.6246 cycles 125714.1815 cycles 125714.7638 cycles]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

withdraw                time:   [129090.5332 cycles 129091.1277 cycles 129091.7330 cycles]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

delegate_stake          time:   [168056.3890 cycles 168057.0830 cycles 168057.7922 cycles]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

deactivate              time:   [127932.5770 cycles 127933.1267 cycles 127933.6789 cycles]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  1 (1.00%) high severe

split                   time:   [132268.8011 cycles 132269.3644 cycles 132269.9224 cycles]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

merge                   time:   [132422.7780 cycles 132423.3270 cycles 132423.8949 cycles]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

get_minimum_delegation  time:   [117062.9767 cycles 117063.5115 cycles 117064.0286 cycles]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

deactivate_delinquent   time:   [196814.9870 cycles 196815.6644 cycles 196816.3155 cycles]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

move_stake              time:   [148335.8410 cycles 148336.4096 cycles 148337.0181 cycles]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

move_lamports           time:   [133850.4211 cycles 133850.9863 cycles 133851.5359 cycles]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe```

</details>

tao-stones · 2024-11-01T18:41:25Z

commit bafcbf9 tries out iai precise valgrind-based benching. The issue with iai is it can't exclude setup code in benching. @ksolana suggested to bench setup code separately, so we can subtract results to deduct actual cost for target instruction execution. This is possible if can isolate pre-execution and post-execution steps in mock_process_instruction.

"Output from linux dev box"

     Running benches/stake_iai.rs (target/release/deps/stake_iai-680ef0d7fd289c61)
bench_setup_initialize
  Instructions:              159772 (+0.231489%)
  L1 Accesses:               202857 (+0.349740%)
  L2 Accesses:                  262 (+2.745098%)
  RAM Accesses:                 965 (+3.875135%)
  Estimated Cycles:          237942 (+0.848521%)

bench_initialize
  Instructions:              290071 (-0.003103%)
  L1 Accesses:               369263 (+0.005958%)
  L2 Accesses:                  602 (-4.444444%)
  RAM Accesses:                1985 (-0.301356%)
  Estimated Cycles:          441748 (-0.074195%)

programs/stake/benches/stake_iai.rs

ksolana · 2024-11-02T12:51:15Z

The iai repo is has some bugs, should we for it and use that one? for example i get

Running benches/stake_iai.rs (target/release/deps/stake_iai-5b6b578e0f124e46)
thread 'main' panicked at /home/sol/.cargo/registry/src/index.crates.io-6f17d22bba15001f/iai-0.1.1/src/lib.rs:162:46:
no entry found for key

Because https://github.com/bheisler/iai/blob/main/src/lib.rs#L195 doesn't check if the events: HashMap has that entry.

ksolana · 2024-11-02T14:10:52Z

Added fixes here: https://github.com/ksolana/iai

tao-stones · 2024-11-04T17:03:52Z

The iai repo is has some bugs, should we for it and use that one? for example i get

Oh boy, less confident to include iai in out CI process. I was thinking perhaps we do it in two steps:

built criterion based bench for all builtin instructions as planned, include those benches into CI (eg agave repo)
In private repo, or public repo outside agave, to use iai for one-off instruction counting. Still verifiable and repeatable

wdyt?

ksolana · 2024-11-04T18:24:12Z

The iai repo is has some bugs, should we for it and use that one? for example i get

Oh boy, less confident to include iai in out CI process. I was thinking perhaps we do it in two steps:
1. built `criterion` based bench for all builtin instructions as planned, include those benches into CI (eg agave repo)

sounds good.

2. In private repo, or public repo outside agave, to use `iai` for one-off instruction counting. Still verifiable and repeatable

wdyt?

btw, have added fixes to a fork in case we want to use them. It is pretty reliable in terms of the instruction count. As the repo is a bit old, if we use iai, we can maybe fork it (or put it in a agave/third-party directory) such that we can make changes.

tao-stones · 2024-11-05T20:59:40Z

Clean up this PR to be benchmarking with criterion only.

output details from MacOS

test result: ok. 0 passed; 0 failed; 119 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/stake.rs (target/release/deps/stake-30de241ba65a74a1)
initialize              time:   [8.4174 µs 8.4238 µs 8.4298 µs]
                        change: [+5.0203% +5.3975% +5.7580%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

initialize_checked      time:   [8.3896 µs 8.3947 µs 8.3997 µs]
                        change: [+4.3862% +4.7103% +5.0163%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  9 (9.00%) high severe

authorize_staker        time:   [8.6889 µs 8.6932 µs 8.6979 µs]
                        change: [+4.4041% +4.6046% +4.8391%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

authorize_withdrawer    time:   [8.6638 µs 8.6755 µs 8.6947 µs]
                        change: [+3.6824% +3.8821% +4.0872%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

authorize_staker_with_seed
                        time:   [9.0976 µs 9.1013 µs 9.1056 µs]
                        change: [+1.1523% +1.3789% +1.6021%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) low severe
  11 (11.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

authorize_withdrawer_with_seed
                        time:   [9.1204 µs 9.1247 µs 9.1298 µs]
                        change: [+1.7527% +1.9746% +2.2131%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

authorize_staker_checked
                        time:   [8.7032 µs 8.7096 µs 8.7157 µs]
                        change: [+1.6797% +1.8199% +1.9674%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

authorize_withdrawer_checked
                        time:   [8.6823 µs 8.6863 µs 8.6913 µs]
                        change: [+0.4697% +0.6846% +0.8861%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

authorize_staker_checked_with_seed
                        time:   [9.0349 µs 9.0403 µs 9.0470 µs]
                        change: [-0.1881% +0.0249% +0.2676%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

authorize_withdrawer_checked_with_seed
                        time:   [9.0584 µs 9.0800 µs 9.1021 µs]
                        change: [+0.2235% +0.4505% +0.7015%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

set_lockup              time:   [8.4184 µs 8.4271 µs 8.4363 µs]
                        change: [-1.5169% -1.3581% -1.1858%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

set_lockup_checked      time:   [8.5259 µs 8.5302 µs 8.5358 µs]
                        change: [-0.5201% -0.3464% -0.1776%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

withdraw                time:   [8.8022 µs 8.8263 µs 8.8467 µs]
                        change: [-1.3121% -1.0881% -0.8395%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

delegate_stake          time:   [11.174 µs 11.179 µs 11.186 µs]
                        change: [+0.1629% +0.3382% +0.5072%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

deactivate              time:   [8.6195 µs 8.6269 µs 8.6340 µs]
                        change: [-1.5562% -1.3538% -1.1725%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  4 (4.00%) low severe
  12 (12.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

split                   time:   [8.9301 µs 8.9409 µs 8.9580 µs]
                        change: [-2.1750% -1.9220% -1.6565%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  5 (5.00%) high severe

merge                   time:   [9.0081 µs 9.0192 µs 9.0319 µs]
                        change: [-2.1769% -1.8864% -1.6085%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

get_minimum_delegation  time:   [7.9740 µs 7.9792 µs 7.9852 µs]
                        change: [-2.2446% -2.0360% -1.8307%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

deactivate_delinquent   time:   [12.791 µs 12.796 µs 12.802 µs]
                        change: [-1.0418% -0.8838% -0.7217%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

move_stake              time:   [9.8807 µs 9.8889 µs 9.8977 µs]
                        change: [-1.1016% -0.7901% -0.4896%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

move_lamports           time:   [9.2087 µs 9.2277 µs 9.2492 µs]
                        change: [-0.7529% -0.5169% -0.2817%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 28 outliers among 100 measurements (28.00%)
  19 (19.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  7 (7.00%) high severe

ksolana

LGTM

tao-stones requested review from KirillLykov and ksolana October 30, 2024 23:37

tao-stones commented Oct 30, 2024

View reviewed changes

tao-stones force-pushed the test-add-stake-program-benches branch 2 times, most recently from 3654db0 to 918456b Compare October 31, 2024 23:56

tao-stones commented Nov 1, 2024

View reviewed changes

programs/stake/benches/stake_iai.rs Outdated Show resolved Hide resolved

tao-stones force-pushed the test-add-stake-program-benches branch from bafcbf9 to 7f9d316 Compare November 5, 2024 20:26

Benchmarking happy-path of all Stake instructions.

15352b0

tao-stones force-pushed the test-add-stake-program-benches branch from 7f9d316 to 15352b0 Compare November 5, 2024 20:55

tao-stones mentioned this pull request Nov 20, 2024

Measure builtin instruction performance #3364

Open

ksolana approved these changes Dec 13, 2024

View reviewed changes

tao-stones merged commit fcff549 into anza-xyz:master Dec 13, 2024
51 checks passed

tao-stones deleted the test-add-stake-program-benches branch December 13, 2024 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test add stake program benches #3396

Test add stake program benches #3396

tao-stones commented Oct 30, 2024

tao-stones left a comment •

edited

Loading

tao-stones commented Nov 1, 2024

tao-stones commented Nov 1, 2024

ksolana commented Nov 2, 2024

ksolana commented Nov 2, 2024

tao-stones commented Nov 4, 2024

ksolana commented Nov 4, 2024

tao-stones commented Nov 5, 2024

ksolana left a comment

Test add stake program benches #3396

Test add stake program benches #3396

Conversation

tao-stones commented Oct 30, 2024

Problem

Summary of Changes

tao-stones left a comment • edited Loading

Choose a reason for hiding this comment

tao-stones commented Nov 1, 2024

tao-stones commented Nov 1, 2024

ksolana commented Nov 2, 2024

ksolana commented Nov 2, 2024

tao-stones commented Nov 4, 2024

ksolana commented Nov 4, 2024

tao-stones commented Nov 5, 2024

ksolana left a comment

Choose a reason for hiding this comment

tao-stones left a comment •

edited

Loading