Measure builtin instruction performance #3364

tao-stones · 2024-10-29T19:16:46Z

SIMD-198 calls for assigning static execution cost (in CU) for each builtin instruction, vs current per buitin programs, it'd help to improve CU allocation, therefore block packing quality.

Instruction	issue/PR	status
System	#3567	merged
Stake	#3396	merged
Loader	#3659	open
Vote
ALT
Config
ComputeBudget	#3614	merged

The text was updated successfully, but these errors were encountered:

tao-stones · 2024-10-29T19:27:41Z

Looking for most efficient and reliable way, of gathering builtin instructions CU / Performance data, I'm aware of few options:

Bench for each instruction is reliable and repeatable, but writing them all up is not a small task
query stats/metric isn't reliable, not repeatable and quicker. However, I am not able to find proper data source from metrics nor 3rd party data I know of.
replay ledger, isn't reliable, not repeatable, and not fast. But can always be done

I'll start with bench first. I tried first to construct simple transaction that only contains necessary data for targeting instruction, for example StakeInstruction::Initialize, then push the transaction through a bank for full scale load_execute_and_commit. This could work, but have too much overhead in bench result. I have since settled down by using invoke_context::mock_process_instruction() in bencher.

apfitzge · 2024-10-31T18:06:58Z

@tao-stones could we write benches for each instruction, on some typical x86 machines, and count instructions via something like criterion-perf-events?

Then just use that number as approximate CUs? That seems like a very consistent way to do it instead of benchmarking time -> convert to CU via some previous constant of cu/us.

wdyt?

tao-stones · 2024-10-31T18:26:55Z

Benching is the way to go, already trying out on Stake. Counting number of instructions is great way to have a consistent measurements (assume an execution path is chosen), @ksolana brought up similar ideas as well. Think worth trying. Also TIL criterion-perf-events

However, doing so will put builtins costs in different realm as the rest still in us/CU conversion land. So instead of builtins costs are inaccurate but stable relative to other instructions, they will be more accurate but not in same scale as the rest. Would that cause concern? (I need to think about it)

apfitzge · 2024-10-31T18:30:21Z

However, doing so will put builtins costs in different realm as the rest still in us/CU conversion land.

CUs used to just be bpf instructions.
I think it's somewhat reasonable to translate from "typicaly x86 instruction" to 1 CU. It's an overestimate since obviously native vs emulated perf implications.

ksolana · 2024-10-31T18:44:19Z

@tao-stones could we write benches for each instruction, on some typical x86 machines, and count instructions via something like criterion-perf-events?

Then just use that number as approximate CUs? That seems like a very consistent way to do it instead of benchmarking time -> convert to CU via some previous constant of cu/us.

wdyt?

Yeah, this is exactly what i was referring to during yesterday's conversation. We can hook up valgrind to get a 'trace' of instruction counts. There might be other ways to get it, but since this is a one off thing, using valgrind isn't a bad idea IMO. A bit tricky but once implemented it is scalable to all the instructions.

tao-stones · 2024-10-31T22:46:15Z

tried criterion-perf-events on StakeInstruction::Initialize, setting measure as Instructions, it prints:

Gnuplot not found, using plotters backend
stakes/initialize       time:   [125501.2873 cycles 125501.9226 cycles 125502.5530 cycles]
                        change: [-0.0002% +0.0011% +0.0024%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

not sure how to read this, are the cycles are "count of instructions"?

apfitzge · 2024-11-01T14:43:51Z

Hmm i think I linked the wrong library looking at that output.

this is what i was thinking of: https://github.com/bheisler/iai

https://bheisler.github.io/criterion.rs/book/iai/getting_started.html

ksolana · 2024-11-01T16:55:55Z

Ah nice! It already uses cachegrind(which is part of the valgrind). Simplifies our work 🔥

tao-stones · 2024-11-01T16:59:46Z

iirc iai cannot exclude setup code, if true, then it is a deal breaker for this use case.

ksolana · 2024-11-01T17:09:03Z

Can we compute the setup cost and subtract from every calculation?

tao-stones · 2024-11-01T17:43:51Z

That's doable, then the project is more like a one-time effort. Still worth trying. Maybe can have both criterion benchmarks and iai benches, then need something automatically subtract setup cost from iai results.

tao-stones · 2024-11-01T17:46:28Z

Also, do you guys know if iai and valgrind are stable on MacOS?

tao-stones · 2024-11-01T18:08:34Z

Got iai work on linux, a result looks like this:

iai bench StakeInitialize

     Running benches/stake_iai.rs (target/release/deps/stake_iai-680ef0d7fd289c61)
bench_setup_initialize
  Instructions:              159403
  L1 Accesses:               202150
  L2 Accesses:                  255
  RAM Accesses:                 929
  Estimated Cycles:          235940

bench_initialize
  Instructions:              290080
  L1 Accesses:               369241
  L2 Accesses:                  630
  RAM Accesses:                1991
  Estimated Cycles:          442076

Note: simply subtracting these two Instructions doesn't give correct answer, cause there are more "inner" pre-execution steps (eg invoke context and accounts priming) and post-execution steps.

ksolana · 2024-11-01T18:16:23Z

Also, do you guys know if iai and valgrind are stable on MacOS?

it's a linux thing. MacOS stopped supporting valgrind in a reasonable way. And for these computations we want to measure X86 instruction count anyways.

tao-stones · 2024-11-20T20:12:10Z

@KirillLykov @ksolana just fyi, separated re-pricing builtin instruction into simd-198, updated this issue's description.

tao-stones self-assigned this Oct 29, 2024

tao-stones mentioned this issue Oct 30, 2024

Test add stake program benches #3396

Merged

This was referenced Nov 10, 2024

Benchmarks for System program instructions #3567

Merged

Benchmark compute-budget instructions #3614

Merged

KirillLykov mentioned this issue Nov 15, 2024

Benchmarks for upgradeable loader instructions #3659

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure builtin instruction performance #3364

Measure builtin instruction performance #3364

tao-stones commented Oct 29, 2024 •

edited

Loading

tao-stones commented Oct 29, 2024 •

edited

Loading

apfitzge commented Oct 31, 2024

tao-stones commented Oct 31, 2024 •

edited

Loading

apfitzge commented Oct 31, 2024

ksolana commented Oct 31, 2024

tao-stones commented Oct 31, 2024

apfitzge commented Nov 1, 2024

ksolana commented Nov 1, 2024

tao-stones commented Nov 1, 2024

ksolana commented Nov 1, 2024

tao-stones commented Nov 1, 2024

tao-stones commented Nov 1, 2024

tao-stones commented Nov 1, 2024 •

edited

Loading

ksolana commented Nov 1, 2024 •

edited

Loading

tao-stones commented Nov 20, 2024

Measure builtin instruction performance #3364

Measure builtin instruction performance #3364

Comments

tao-stones commented Oct 29, 2024 • edited Loading

tao-stones commented Oct 29, 2024 • edited Loading

apfitzge commented Oct 31, 2024

tao-stones commented Oct 31, 2024 • edited Loading

apfitzge commented Oct 31, 2024

ksolana commented Oct 31, 2024

tao-stones commented Oct 31, 2024

apfitzge commented Nov 1, 2024

ksolana commented Nov 1, 2024

tao-stones commented Nov 1, 2024

ksolana commented Nov 1, 2024

tao-stones commented Nov 1, 2024

tao-stones commented Nov 1, 2024

tao-stones commented Nov 1, 2024 • edited Loading

ksolana commented Nov 1, 2024 • edited Loading

tao-stones commented Nov 20, 2024

tao-stones commented Oct 29, 2024 •

edited

Loading

tao-stones commented Oct 29, 2024 •

edited

Loading

tao-stones commented Oct 31, 2024 •

edited

Loading

tao-stones commented Nov 1, 2024 •

edited

Loading

ksolana commented Nov 1, 2024 •

edited

Loading