Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance of merge_plans_after #10

Open
jhaye opened this issue Jan 19, 2022 · 7 comments
Open

Investigate performance of merge_plans_after #10

jhaye opened this issue Jan 19, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@jhaye
Copy link
Collaborator

jhaye commented Jan 19, 2022

This function does a lot of cloning and attempts to optimize it, but I suspect the optimizations don't do anything.

@jhaye jhaye added the enhancement New feature or request label Jan 19, 2022
@jhaye
Copy link
Collaborator Author

jhaye commented Sep 8, 2022

image

My latest testing shows that the optimisations do help, specifically in the case of benchmarks where the performance does not improve when parallelised. It's nice to know now why Rust performs better in these scenarios than C/C++.
Otherwise I've also tested the new-opts branch which has promising performance in the cigarette smokers benchmark.

@cmnrd
Copy link
Contributor

cmnrd commented Sep 8, 2022

What exactly does merge_plans_after do? I don't think we have something equivalent in the other runtimes.

@oowekyala
Copy link
Collaborator

That's the function that merges "reaction plans", which are conceptually just sets of reactions. It's called when more than one trigger triggers at the same time step (each trigger contributes its own plan).

@jhaye
Copy link
Collaborator Author

jhaye commented Sep 8, 2022

I would guess that the performance gain comes from each reaction plan being wrapped into an optional clone-on-write smart pointer. The function makes various checks to avoid cloning.

@cmnrd
Copy link
Contributor

cmnrd commented Sep 9, 2022

Ok, I am not sure though, if it explains why the Rust execution is faster than C/C++ for some benchmarks. In the C++ runtime, we essentially use an array of arrays. You can think of it as a map of a reaction levels to a list of all reactions with this level that have been triggered. Whenever a reaction is triggered at the current tag, it is inserted in this data structure. Note that we do not filter duplicates at insertion. Instead, the scheduler, when it advances to the next level, performs a sort on the list to filter all duplicates and then executes all reactions in this level. This seems to be pretty efficient, but there are some downsides. In particular, if there are many levels, but only a few reactions triggered, we always have to circle through all levels. Does this work differently in Rust?

@oowekyala
Copy link
Collaborator

oowekyala commented Sep 9, 2022

In Rust the principle is the same, but the data structure to store reactions is sparse. It's something like a Vec<(LevelIndex, Level)>, sorted by level index, where a Level is a Vec<ReactionId>. If a given level has no reactions then there is no entry in the vector. Level lists are kept duplicate free during insertion. My initial idea was to avoid mutating/cloning anything if we find out that we are merging two equivalent sets, but I don't think the runtime currently does this, although it does try to avoid mutation and cloning in other cases. The reason we avoid mutation is to be able to use pre-computed shared instances without cloning them (with the Cow smart pointer).

A further optimisation that is done in Rust is, eg if we're on level 12, and something triggers, we won't even look at levels < 12 and just try to merge from 12 on. Just writing this I realise that this is probably useless, as a reaction on level n cannot trigger a reaction on level < n by definition of the level... Tearing that down could be a nice code simplification

@jhaye
Copy link
Collaborator Author

jhaye commented Sep 9, 2022

image

Here is a detailed breakdown of main and new-opts. Overall it seems like an improvement, but for Banking with 1 and 2 workers as well as Big with 1 worker it's slower for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants