-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for non-uniform key distribution #7
Comments
It's a great suggestion, though I'd say |
I'll work on it when I find the time :) The zipf create requires rand 0.8 which changed the |
Do I understand it correctly, that |
Couldn't ask for anything more :)
Hmm, that's a good question. I'm okay with changing the seed type if that's what Lines 428 to 429 in 5109b8d
It could be that zipf randomization is too slow to run on-line, and we end up just benchmarking the random number generation, and that we actually need a pre-phase just to generate random numbers when using zipf.
The prefill definitely assumes that: Line 352 in 5109b8d
From a skim to cache the logic back in, I believe Line 447 in 5109b8d
This illustrates a shortcoming of the current benchmark design, which is that each key really operates on its own distinct set of keys — two threads never access the same key. Or if they do, the code would panic 😅 At least unless I'm missing something on my re-reading of the code. It's definitely going to be tricky to add Zipf here since you don't have an obvious cheap way to determine whether a key should be present or not. I wonder whether you need a much more sophisticated prefill that doesn't generate the keys and operations separately, but actually generates a full operational log with associated keys for each thread — and it could keep track of what the current state should be and therefore bake in what the valid operations and expected outcomes are. That should work for uniform operation as well. |
The question is if we need to check correctness at all when doing benchmarking. Currently the tests could fail because there is a very small chance of duplicate keys.
I would change generate the keys themselves with zipf. I hacked together a small abstraction over the keys and will create a PR if I find the time. |
The challenge with Zipf is that the chance of duplicate keys is actually very high 😅 |
One could track the number of succesfull and failed operations and view the operation mixture as a rough target. On a sufficiently large set it should be possible to tune the key operation distribution in a way that hits the target. I'd argue that it is more important for a meaningful benchmark to measure the performance under real contention than hitting the correct operation ratio. Although that probably depends on the application. |
I worry that, especially once you introduce key skew, you'd end up deviating so far from the target that the results no longer measure what the mix is intended to measure. But if you can provide a rationale for why that isn't likely to be the case (ideally with a benchmark to demonstrate), I'm definitely keen to change my mind! |
I suggest adding an option to use a Zipf distribution to sample the keys. It might give a more realistic work load for heavy contention applications as many real word datasets follow it.
The rand_distr crate has an implementation that is not yet released here.
The text was updated successfully, but these errors were encountered: