Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#38: Add VM benchmarking #42

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

apetkov-so
Copy link
Contributor

No description provided.

@apetkov-so apetkov-so self-assigned this Jan 7, 2025
@apetkov-so apetkov-so linked an issue Jan 7, 2025 that may be closed by this pull request
@apetkov-so apetkov-so added the vm Aranya policy lang VM label Jan 7, 2025
pub fn call_seal(
&mut self,
name: &str,
this_data: &Struct,
) -> Result<ExitReason, MachineError> {
#[cfg(feature = "bench")]
self.stopwatch.start("call_seal");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to add the command name, especially since we're fixing the ability to publish multiple commands in an action.

Copy link
Contributor Author

@apetkov-so apetkov-so Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. BTW, I figured out why call_seal/call_open time measurements don't complete... It's because run collects benchmarking results when it exits, and this happens before the stopwatch.stop() calls from the call_seal/call_open. All other stopwatch calls happen inside a run, but seal/open are exceptions.

Edit: I'm thinking about just not timing the seal/open functions. They essentially just make FFI calls (via run), which are already timed.

crates/aranya-runtime/benches/lib.rs Outdated Show resolved Hide resolved
@apetkov-so apetkov-so force-pushed the 38-create-vm-benchmarking-suite branch from d851bdf to 51fabe2 Compare January 10, 2025 20:46
@@ -0,0 +1,138 @@
#[cfg(feature = "bench")]
#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be #[bench]? How do you actually run this? If I do cargo bench it doesn't run because it's not #[bench] and if I do cargo test it doesn't run it because it doesn't look in the benches dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it will only run with cargo test if the "bench" feature flag is set. Agree it should just be #[bench].

Copy link
Contributor Author

@apetkov-so apetkov-so Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize you can use features like this. The test runs when the bench feature is enabled in aranya-runtime/Cargo.toml. There's probably a better way, that I don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you need the bench feature, but that doesn't make it run benches under cargo test. You need --benches for that. I figured it out - to get this to run and print stats you need to do:

cargo test --benches --features=bench -p aranya-runtime -- --nocapture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently #[feature] is unstable.

Copy link
Contributor

@chip-so chip-so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some example output from my testing:

----- Benchmark Results: -----
setup_action: init: (1 samples), best: 4µs, worst: 4µs, mean: 4µs, SD: 0ns
query.start: (1 samples), best: 2.958µs, worst: 2.958µs, mean: 2.958µs, SD: 0ns
setup_command: Insert: (9 samples), best: 2.333µs, worst: 2.959µs, mean: 2.463µs, SD: 183ns
setup_command: DoSomething: (1 samples), best: 1.875µs, worst: 1.875µs, mean: 1.875µs, SD: 0ns
publish: (11 samples), best: 875ns, worst: 3.083µs, mean: 1.277µs, SD: 631ns
extcall: (22 samples), best: 292ns, worst: 6.333µs, mean: 1.153µs, SD: 1.248µs
setup_command: Init: (1 samples), best: 875ns, worst: 875ns, mean: 875ns, SD: 0ns
query.next: (10 samples), best: 333ns, worst: 2.541µs, mean: 721ns, SD: 613ns
create: (9 samples), best: 375ns, worst: 2.25µs, mean: 643ns, SD: 572ns
validate_fact_literal: (1 samples), best: 417ns, worst: 417ns, mean: 417ns, SD: 0ns
setup_action: run: (1 samples), best: 334ns, worst: 334ns, mean: 334ns, SD: 0ns
serialize: (11 samples), best: 166ns, worst: 833ns, mean: 318ns, SD: 208ns
deserialize: (11 samples), best: 125ns, worst: 791ns, mean: 299ns, SD: 184ns
setup_action: insert: (9 samples), best: 208ns, worst: 750ns, mean: 287ns, SD: 166ns
fact.kset: (9 samples), best: 125ns, worst: 708ns, mean: 199ns, SD: 180ns
struct.get: (18 samples), best: 125ns, worst: 458ns, mean: 171ns, SD: 75ns
validate_struct_schema: (11 samples), best: 83ns, worst: 459ns, mean: 163ns, SD: 140ns
struct.new: (11 samples), best: 41ns, worst: 1.208µs, mean: 155ns, SD: 333ns
struct.set: (18 samples), best: 83ns, worst: 375ns, mean: 150ns, SD: 60ns
def: (51 samples), best: 41ns, worst: 1.375µs, mean: 145ns, SD: 245ns
return: (33 samples), best: 0ns, worst: 584ns, mean: 138ns, SD: 118ns
get: (58 samples), best: 41ns, worst: 417ns, mean: 137ns, SD: 70ns
fact.vset: (9 samples), best: 83ns, worst: 208ns, mean: 116ns, SD: 38ns
call: (22 samples), best: 41ns, worst: 750ns, mean: 114ns, SD: 141ns
fact.new: (10 samples), best: 41ns, worst: 458ns, mean: 88ns, SD: 124ns
end: (21 samples), best: 41ns, worst: 250ns, mean: 85ns, SD: 53ns
block: (21 samples), best: 0ns, worst: 125ns, mean: 61ns, SD: 37ns
branch: (10 samples), best: 0ns, worst: 167ns, mean: 50ns, SD: 40ns
meta:: (109 samples), best: 0ns, worst: 166ns, mean: 50ns, SD: 23ns
jump: (9 samples), best: 0ns, worst: 84ns, mean: 37ns, SD: 23ns
exit: (33 samples), best: 0ns, worst: 42ns, mean: 29ns, SD: 19ns

This is kind of a hodgepodge. Lots of things are being benchmarked here but it's really hard to separate out individual instruction timing versus other internal operations. Some thoughts on how to improve this:

  1. Improve BenchMeasurements/BenchStats so that stats of interest can be filtered. i.e. some method in BenchMeasurements that consumes self and returns a new BenchMeasurements with only the stats you want. Would be really slick if you could select by category (e.g. "instructions" versus "validation" versus "setup")
  2. Improve output with columns to make it more readable (maybe look at table_formatter or tablestream.
  3. output this data in some kind of format that can be consumed by other tools (probably CSV or JSON)

@apetkov-so apetkov-so force-pushed the 38-create-vm-benchmarking-suite branch from 72de591 to f50cdeb Compare January 22, 2025 19:16
@chip-so
Copy link
Contributor

chip-so commented Jan 28, 2025

 Name                         # Samples  Best     Worst     Mean     SD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 setup_action: init                   2    1.5µs      75µs  38.25µs  36.75µs
 setup_command: Increment             1  7.416µs   7.416µs  7.416µs      0ns
 setup_command: Create                1  4.875µs   4.875µs  4.875µs      0ns
 query                                1  4.792µs   4.792µs  4.792µs      0ns
 setup_command: Insert                9  4.208µs   4.625µs  4.343µs    131ns
 setup_action: create_action          1  3.917µs   3.917µs  3.917µs      0ns

This is much nicer.

chip-so
chip-so previously approved these changes Jan 28, 2025
Copy link
Contributor

@chip-so chip-so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of other things I'd like to improve, but I think this needs to be done so we can move on.

@apetkov-so apetkov-so force-pushed the 38-create-vm-benchmarking-suite branch from 5448644 to 7dcbfec Compare February 3, 2025 20:35
@apetkov-so apetkov-so force-pushed the 38-create-vm-benchmarking-suite branch from 7dcbfec to 654422d Compare February 5, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vm Aranya policy lang VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create VM benchmarking suite
4 participants