-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure builtin instruction performance #3364
Comments
Looking for most efficient and reliable way, of gathering builtin instructions CU / Performance data, I'm aware of few options:
I'll start with bench first. I tried first to construct simple transaction that only contains necessary data for targeting instruction, for example |
@tao-stones could we write benches for each instruction, on some typical x86 machines, and count instructions via something like criterion-perf-events? Then just use that number as approximate CUs? That seems like a very consistent way to do it instead of benchmarking time -> convert to CU via some previous constant of cu/us. wdyt? |
Benching is the way to go, already trying out on However, doing so will put builtins costs in different realm as the rest still in us/CU conversion land. So instead of builtins costs are inaccurate but stable relative to other instructions, they will be more accurate but not in same scale as the rest. Would that cause concern? (I need to think about it) |
CUs used to just be bpf instructions. |
Yeah, this is exactly what i was referring to during yesterday's conversation. We can hook up valgrind to get a 'trace' of instruction counts. There might be other ways to get it, but since this is a one off thing, using valgrind isn't a bad idea IMO. A bit tricky but once implemented it is scalable to all the instructions. |
tried
not sure how to read this, are the |
Hmm i think I linked the wrong library looking at that output. this is what i was thinking of: https://github.com/bheisler/iai https://bheisler.github.io/criterion.rs/book/iai/getting_started.html |
Ah nice! It already uses cachegrind(which is part of the valgrind). Simplifies our work 🔥 |
iirc |
Can we compute the setup cost and subtract from every calculation? |
That's doable, then the project is more like a one-time effort. Still worth trying. Maybe can have both criterion benchmarks and iai benches, then need something automatically subtract setup cost from iai results. |
Also, do you guys know if |
Got iai bench StakeInitialize
Note: simply subtracting these two |
it's a linux thing. MacOS stopped supporting valgrind in a reasonable way. And for these computations we want to measure X86 instruction count anyways. |
@KirillLykov @ksolana just fyi, separated re-pricing builtin instruction into simd-198, updated this issue's description. |
SIMD-198 calls for assigning static execution cost (in
CU
) for each builtin instruction, vs current per buitin programs, it'd help to improve CU allocation, therefore block packing quality.The text was updated successfully, but these errors were encountered: