Add option for non-uniform key distribution #7

terrorfisch · 2021-09-15T19:26:43Z

I suggest adding an option to use a Zipf distribution to sample the keys. It might give a more realistic work load for heavy contention applications as many real word datasets follow it.

The rand_distr crate has an implementation that is not yet released here.

jonhoo · 2021-09-17T00:28:03Z

It's a great suggestion, though I'd say zipf is probably a better choice for benchmarking as it's both stable and very fast. Would be happy to take a look at a PR!

terrorfisch · 2021-09-17T14:50:41Z

I'll work on it when I find the time :)

The zipf create requires rand 0.8 which changed the SmallRng and its seed size on 64bit systems. Would you rather change the public seed type in bustle to u64 and use Seedable::seed_from_u64 or stick explicitly to the previous PNRG Mcg128Xsl64?

terrorfisch · 2021-09-17T15:44:02Z

Do I understand it correctly, that mix and the prefill assume that the keys are unique?

jonhoo · 2021-09-19T01:34:08Z

I'll work on it when I find the time :)

Couldn't ask for anything more :)

The zipf create requires rand 0.8 which changed the SmallRng and its seed size on 64bit systems. Would you rather change the public seed type in bustle to u64 and use Seedable::seed_from_u64 or stick explicitly to the previous PNRG Mcg128Xsl64?

Hmm, that's a good question. I'm okay with changing the seed type if that's what rand has done, though it is vital that the per-thread randomizer remain very fast since we're benchmarking super high-performance systems:

bustle/src/lib.rs

Lines 428 to 429 in 5109b8d

    
           // We're going to use a very simple LCG to pick random keys. 
        
           // We want it to be _super_ fast so it doesn't add any overhead.

It could be that zipf randomization is too slow to run on-line, and we end up just benchmarking the random number generation, and that we actually need a pre-phase just to generate random numbers when using zipf.

Do I understand it correctly, that mix and the prefill assume that the keys are unique?

The prefill definitely assumes that:

bustle/src/lib.rs

Line 352 in 5109b8d

assert!(inserted);

From a skim to cache the logic back in, I believe mix also assumes it, since it relies just on the indices into the key list to determine whether a given key should be present or not

bustle/src/lib.rs

Line 447 in 5109b8d

let should_find = find_seq >= erase_seq && find_seq < insert_seq;

This illustrates a shortcoming of the current benchmark design, which is that each key really operates on its own distinct set of keys — two threads never access the same key. Or if they do, the code would panic 😅 At least unless I'm missing something on my re-reading of the code.

It's definitely going to be tricky to add Zipf here since you don't have an obvious cheap way to determine whether a key should be present or not. I wonder whether you need a much more sophisticated prefill that doesn't generate the keys and operations separately, but actually generates a full operational log with associated keys for each thread — and it could keep track of what the current state should be and therefore bake in what the valid operations and expected outcomes are. That should work for uniform operation as well.

terrorfisch · 2021-09-22T08:34:32Z

The question is if we need to check correctness at all when doing benchmarking. Currently the tests could fail because there is a very small chance of duplicate keys.

It could be that zipf randomization is too slow to run on-line, and we end up just benchmarking the random number generation, and that we actually need a pre-phase just to generate random numbers when using zipf.

I would change generate the keys themselves with zipf. I hacked together a small abstraction over the keys and will create a PR if I find the time.

jonhoo · 2021-09-24T00:41:25Z

The challenge with Zipf is that the chance of duplicate keys is actually very high 😅
In reality, these checks aren't there so much to see that the backing implementation, but rather to make sure that we have correctly tracked what state the backing data structure is in. If we don't, then the mix would end up being wrong, and the benchmark numbers unreliable. For example, let's say we didn't error on these cases, and then had a bug in the generator such that all removes ended up being for keys that were never inserted — in that case, most backends would get far better performance, but it's only because we'd be measuring the wrong thing!

terrorfisch · 2021-10-01T09:27:05Z

One could track the number of succesfull and failed operations and view the operation mixture as a rough target. On a sufficiently large set it should be possible to tune the key operation distribution in a way that hits the target.

I'd argue that it is more important for a meaningful benchmark to measure the performance under real contention than hitting the correct operation ratio. Although that probably depends on the application.

jonhoo · 2021-10-03T18:16:09Z

I worry that, especially once you introduce key skew, you'd end up deviating so far from the target that the results no longer measure what the mix is intended to measure. But if you can provide a rationale for why that isn't likely to be the case (ideally with a benchmark to demonstrate), I'm definitely keen to change my mind!

jonhoo mentioned this issue Oct 3, 2021

Update rand to 0.8 but use the SmallRng from 0.7 #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for non-uniform key distribution #7

Add option for non-uniform key distribution #7

terrorfisch commented Sep 15, 2021

jonhoo commented Sep 17, 2021

terrorfisch commented Sep 17, 2021

terrorfisch commented Sep 17, 2021

jonhoo commented Sep 19, 2021

terrorfisch commented Sep 22, 2021

jonhoo commented Sep 24, 2021

terrorfisch commented Oct 1, 2021

jonhoo commented Oct 3, 2021

Add option for non-uniform key distribution #7

Add option for non-uniform key distribution #7

Comments

terrorfisch commented Sep 15, 2021

jonhoo commented Sep 17, 2021

terrorfisch commented Sep 17, 2021

terrorfisch commented Sep 17, 2021

jonhoo commented Sep 19, 2021

terrorfisch commented Sep 22, 2021

jonhoo commented Sep 24, 2021

terrorfisch commented Oct 1, 2021

jonhoo commented Oct 3, 2021