cache line pad mutex #4

jnordwick · 2024-08-13T16:44:02Z

I'm not sure this will particularly matter, but the ThreadPool mutex is going to be sharing a cache line with other other data. Since zig reorders fields in regular structs, its unclear what it is sharing a cache line with: could be something that matters, could be irrelevant.

notcancername · 2024-08-14T11:18:19Z

Consider std.atomic.cache_line for this purpose.

jnordwick · 2024-08-15T03:13:17Z

Consider std.atomic.cache_line for this purpose.

Those values seem really dated. They are taking from this list that rust and go also use. They show x86_64 destructive interference as 128B, two cache lines, but that is because of the old L2 spatial prefetcher, and there might be a misunderstanding on how it works.

I never saw good data. C++ gives 64 for destructive interference (false sharing) and 128 for constructive interference (true sharing), and I think those are more correct.

I know they go back a while, and I've never seen compelling evidence. Just some random assertions on comments. The L2 spatial prefetcher will try to complete pairs of lines, but it won't evict L1 data that is modified.

Anyways. One a 3 year old (2021) AMD Ryzen 7 PRO 5850U, these are the timings for various paddings:

pad to   8: 3961705236 ns
pad to  16: 2054116545 ns
pad to  32: 1948227890 ns
pad to  64: 178296429 ns
pad to 128: 177247588 ns
pad to 256: 180107536 ns

64 bytes and above are all the same (on multiple runs, any of those can out faster than the others). Here's the really ugly code I used to test it. if you want to try on your machine, just save it and there is a comment at the top that gives a line to copy and past that will run it for various padding sizes:

https://gist.github.com/jnordwick/b30b1584fd7c49d68a6bb842abf7d98b

EDIT: Here are the results on a 2018 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz:. The sane pattern. So whatever led to those old results doesn't seem to apply anymore. Or my test is fucked:

pad to   8: 5155030822 ns
pad to  16: 5961970989 ns
pad to  32: 3738350274 ns
pad to  64: 466117370 ns
pad to 128: 466903391 ns
pad to 256: 470904977 ns

judofyr · 2024-08-16T06:08:54Z

Oh, very interesting. I don't know too much about mutexes + cache lines so at the moment I don't think I'll look into this. Maybe later when I want to learn more I'll use this as an example to run some benchmark.

Keeping this issue open in case others want to improve.

QuarticCat · 2024-08-28T17:52:03Z

I know they go back a while, and I've never seen compelling evidence. Just some random assertions on comments. The L2 spatial prefetcher will try to complete pairs of lines, but it won't evict L1 data that is modified.

Try this:

#include <atomic>
#include <thread>

alignas(128) std::atomic<int> counter[1024]{};

void update(int idx) {
    for (int j = 0; j < 100000000; ++j) ++counter[idx];
}

int main() {
    std::thread t1(update, 0);
    std::thread t2(update, 0);
    std::thread t3(update, 16);
    std::thread t4(update, 16);
    t1.join();
    t2.join();
    t3.join();
    t4.join();
}

which comes from a question I asked on StackOverflow.

jnordwick · 2024-09-25T16:39:57Z

thank you. give me a week and i'll get black to you with my results.

…

On Wed, Aug 28, 2024 at 1:52 PM QuarticCat ***@***.***> wrote: I know they go back a while, and I've never seen compelling evidence. Just some random assertions on comments. The L2 spatial prefetcher will try to complete pairs of lines, but it won't evict L1 data that is modified. Try this: #include <atomic> #include <thread> alignas(128) std::atomic<int> counter[1024]{}; void update(int idx) { for (int j = 0; j < 100000000; ++j) ++counter[idx]; } int main() { std::thread t1(update, 0); std::thread t2(update, 0); std::thread t3(update, 16); std::thread t4(update, 16); t1.join(); t2.join(); t3.join(); t4.join(); } which comes from a question <https://stackoverflow.com/questions/72126606/should-the-cache-padding-size-of-x86-64-be-128-bytes> I asked on StackOverflow. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AASBHXB2W3MBSIUSRNNTVY3ZTYE5RAVCNFSM6AAAAABMOU5LXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJVHE2DENJZGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

judofyr added help wanted Extra attention is needed enhancement New feature or request labels Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache line pad mutex #4

cache line pad mutex #4

jnordwick commented Aug 13, 2024 •

edited

Loading

notcancername commented Aug 14, 2024 •

edited

Loading

jnordwick commented Aug 15, 2024 •

edited

Loading

judofyr commented Aug 16, 2024

QuarticCat commented Aug 28, 2024

jnordwick commented Sep 25, 2024 via email

cache line pad mutex #4

cache line pad mutex #4

Comments

jnordwick commented Aug 13, 2024 • edited Loading

notcancername commented Aug 14, 2024 • edited Loading

jnordwick commented Aug 15, 2024 • edited Loading

judofyr commented Aug 16, 2024

QuarticCat commented Aug 28, 2024

jnordwick commented Sep 25, 2024 via email

jnordwick commented Aug 13, 2024 •

edited

Loading

notcancername commented Aug 14, 2024 •

edited

Loading

jnordwick commented Aug 15, 2024 •

edited

Loading