-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#38: Add VM benchmarking #42
base: main
Are you sure you want to change the base?
Conversation
pub fn call_seal( | ||
&mut self, | ||
name: &str, | ||
this_data: &Struct, | ||
) -> Result<ExitReason, MachineError> { | ||
#[cfg(feature = "bench")] | ||
self.stopwatch.start("call_seal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to add the command name, especially since we're fixing the ability to publish multiple commands in an action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. BTW, I figured out why call_seal
/call_open
time measurements don't complete... It's because run
collects benchmarking results when it exits, and this happens before the stopwatch.stop()
calls from the call_seal
/call_open
. All other stopwatch calls happen inside a run
, but seal/open are exceptions.
Edit: I'm thinking about just not timing the seal/open functions. They essentially just make FFI calls (via run
), which are already timed.
d851bdf
to
51fabe2
Compare
@@ -0,0 +1,138 @@ | |||
#[cfg(feature = "bench")] | |||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be #[bench]
? How do you actually run this? If I do cargo bench
it doesn't run because it's not #[bench]
and if I do cargo test
it doesn't run it because it doesn't look in the benches
dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it will only run with cargo test
if the "bench" feature flag is set. Agree it should just be #[bench]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize you can use features like this. The test runs when the bench
feature is enabled in aranya-runtime/Cargo.toml
. There's probably a better way, that I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you need the bench
feature, but that doesn't make it run benches under cargo test
. You need --benches
for that. I figured it out - to get this to run and print stats you need to do:
cargo test --benches --features=bench -p aranya-runtime -- --nocapture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently #[feature]
is unstable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some example output from my testing:
----- Benchmark Results: -----
setup_action: init: (1 samples), best: 4µs, worst: 4µs, mean: 4µs, SD: 0ns
query.start: (1 samples), best: 2.958µs, worst: 2.958µs, mean: 2.958µs, SD: 0ns
setup_command: Insert: (9 samples), best: 2.333µs, worst: 2.959µs, mean: 2.463µs, SD: 183ns
setup_command: DoSomething: (1 samples), best: 1.875µs, worst: 1.875µs, mean: 1.875µs, SD: 0ns
publish: (11 samples), best: 875ns, worst: 3.083µs, mean: 1.277µs, SD: 631ns
extcall: (22 samples), best: 292ns, worst: 6.333µs, mean: 1.153µs, SD: 1.248µs
setup_command: Init: (1 samples), best: 875ns, worst: 875ns, mean: 875ns, SD: 0ns
query.next: (10 samples), best: 333ns, worst: 2.541µs, mean: 721ns, SD: 613ns
create: (9 samples), best: 375ns, worst: 2.25µs, mean: 643ns, SD: 572ns
validate_fact_literal: (1 samples), best: 417ns, worst: 417ns, mean: 417ns, SD: 0ns
setup_action: run: (1 samples), best: 334ns, worst: 334ns, mean: 334ns, SD: 0ns
serialize: (11 samples), best: 166ns, worst: 833ns, mean: 318ns, SD: 208ns
deserialize: (11 samples), best: 125ns, worst: 791ns, mean: 299ns, SD: 184ns
setup_action: insert: (9 samples), best: 208ns, worst: 750ns, mean: 287ns, SD: 166ns
fact.kset: (9 samples), best: 125ns, worst: 708ns, mean: 199ns, SD: 180ns
struct.get: (18 samples), best: 125ns, worst: 458ns, mean: 171ns, SD: 75ns
validate_struct_schema: (11 samples), best: 83ns, worst: 459ns, mean: 163ns, SD: 140ns
struct.new: (11 samples), best: 41ns, worst: 1.208µs, mean: 155ns, SD: 333ns
struct.set: (18 samples), best: 83ns, worst: 375ns, mean: 150ns, SD: 60ns
def: (51 samples), best: 41ns, worst: 1.375µs, mean: 145ns, SD: 245ns
return: (33 samples), best: 0ns, worst: 584ns, mean: 138ns, SD: 118ns
get: (58 samples), best: 41ns, worst: 417ns, mean: 137ns, SD: 70ns
fact.vset: (9 samples), best: 83ns, worst: 208ns, mean: 116ns, SD: 38ns
call: (22 samples), best: 41ns, worst: 750ns, mean: 114ns, SD: 141ns
fact.new: (10 samples), best: 41ns, worst: 458ns, mean: 88ns, SD: 124ns
end: (21 samples), best: 41ns, worst: 250ns, mean: 85ns, SD: 53ns
block: (21 samples), best: 0ns, worst: 125ns, mean: 61ns, SD: 37ns
branch: (10 samples), best: 0ns, worst: 167ns, mean: 50ns, SD: 40ns
meta:: (109 samples), best: 0ns, worst: 166ns, mean: 50ns, SD: 23ns
jump: (9 samples), best: 0ns, worst: 84ns, mean: 37ns, SD: 23ns
exit: (33 samples), best: 0ns, worst: 42ns, mean: 29ns, SD: 19ns
This is kind of a hodgepodge. Lots of things are being benchmarked here but it's really hard to separate out individual instruction timing versus other internal operations. Some thoughts on how to improve this:
- Improve
BenchMeasurements
/BenchStats
so that stats of interest can be filtered. i.e. some method inBenchMeasurements
that consumesself
and returns a newBenchMeasurements
with only the stats you want. Would be really slick if you could select by category (e.g. "instructions" versus "validation" versus "setup") - Improve output with columns to make it more readable (maybe look at table_formatter or tablestream.
- output this data in some kind of format that can be consumed by other tools (probably CSV or JSON)
72de591
to
f50cdeb
Compare
This is much nicer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of other things I'd like to improve, but I think this needs to be done so we can move on.
5448644
to
7dcbfec
Compare
…tate exit - and aggregates benchmarking results - before the calling functions can stop the timers.
7dcbfec
to
654422d
Compare
No description provided.