-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group limit benchmarks #793
Conversation
- set `MAX_GROUP_SIZE` to 300 - use all identity samples for benchmarks - reference github issue in TODO comment
This is great work @insipx. This has come a long way since my dumb benchmarks. The flamegraphs are super helpful. While the data would be noisier, I'd be interested in seeing what these numbers look like when we are communicating with the dev network where network latency plays a bigger factor. Those probably offer a better view of what real-world performance looks like. |
Definitely can make a report using the dev network! |
This seems quite manageable. As a group creator, if I already have a list of 5,000 addresses that I want to create a group from, there's no way I'm entering them by hand. It's no sweat waiting a minute while that group is created in the background by whatever bulk upload tool I'm using. I just expect to be notified when it's done. On the other hand, waiting an hour would be unacceptable.
I'm very interested to see how this benchmarks once the bug is fixed, as this is the scenario that concerns me more. Adding 1 address to a group of 400 is a by-hand operation that will be expected to complete instantly. |
That makes sense! That bug is an easier fix too, so it makes sense seeing those benchmarks to ensure adding to an already-large group is fast enough. According to Cryspens early benchmarks (which don't have the network overhead here, since it's only the openMLS library), adding to a 1000 member group takes |
benchmark Adding one member to a group is much faster than creating a large group from scratch but could still use improvement. After ~4000 members, it will start to take more than 1 second to add a new member. benchmarks for encrypt_welcome. The average size of the welcome message at ~1600 members is 626,120 bytes which according to these benchmarks takes ~2ms. The network overhead in |
For the openmls benchmarks I generate groups separately and write them out to a file. Then you only need to read them in the benchmarks. Even that takes some time, but it allows testing larger groups. Maybe that's something you could do here as well? |
Yeah I already have logic to pre-generate the identities and write to a file, separately for local and our dev networks. I think most of this time is spent between the local libxmtp client and our backend service Will update this with updated flamegraphs and dev network benchmarks today |
Updated PR/Verdict with benchmarks against dev net + flamegraphs for the individual benchmarks, was only able to get the group size up to 450. After 450 ran into errors with broken pipes and benchmark would not complete. I assume this is because it takes a long time and the gRPC server just closes the connection after a while. So far, all the MLS stuff is really fast and barely shows up in the benchmarks. Most of the bottlenecks are things around gRPC and fetching identities/sending welcomes etc. |
local gRPC benchmarks: benchmark report
Benchmarks against dev network: report
Message Size Limits
Issue: #812
This isn't exact, but we eventually reach the gRPC message size limit
For this PR, I added in chunking for the
send_welcomes
fn which solves this, but we're still sending lots of data over the wire + encrypting a large welcome message, and very large group sizes will still hit the limit (it would require the ratchet tree itself to reach greater than the gRPC limit).Flamegraph (All flamegraphs are local gRPC)
CPU Flamegraph for member addition
interactive flamegraph link
Add to empty group
interactive link
Add to 100 member group
interactive
Remove all members
interactive
Add 1 member to group
interactive
Bugs
There's currently a bug for adding to a populated group. Any benches that add to an already-populated group are not included and annotated with#[allow(dead_code)]
until fixed.Verdict
It's difficult to recommend increasing the group size much above 400 if it takes > 3-4 seconds to add all those members -- although this also depends how long users would actually be willing to wait to add large amounts of members.
Adding members seems to be the slowest so performance focus should be on making that faster. Removing members (even large amounts) seems to be performing well compared to member addition.
The flamegraph indicates we are spending lots of time onencrypt_welcome
, so this could mean that focusing on decreasing the total welcome size could result in a more performant experience for member addition.After generating flamegraphs for each benchmark individually, it's clear that there are a few main areas that would benefit from optimization work:
get_identity_updates_v2
faster would speed up every benchmarkget_identity_updates_v2
fixes #810