-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache line pad mutex #4
Comments
Consider |
Those values seem really dated. They are taking from this list that rust and go also use. They show x86_64 destructive interference as 128B, two cache lines, but that is because of the old L2 spatial prefetcher, and there might be a misunderstanding on how it works. I never saw good data. C++ gives 64 for destructive interference (false sharing) and 128 for constructive interference (true sharing), and I think those are more correct. I know they go back a while, and I've never seen compelling evidence. Just some random assertions on comments. The L2 spatial prefetcher will try to complete pairs of lines, but it won't evict L1 data that is modified. Anyways. One a 3 year old (2021) AMD Ryzen 7 PRO 5850U, these are the timings for various paddings:
64 bytes and above are all the same (on multiple runs, any of those can out faster than the others). Here's the really ugly code I used to test it. if you want to try on your machine, just save it and there is a comment at the top that gives a line to copy and past that will run it for various padding sizes: https://gist.github.com/jnordwick/b30b1584fd7c49d68a6bb842abf7d98b EDIT: Here are the results on a 2018 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz:. The sane pattern. So whatever led to those old results doesn't seem to apply anymore. Or my test is fucked:
|
Oh, very interesting. I don't know too much about mutexes + cache lines so at the moment I don't think I'll look into this. Maybe later when I want to learn more I'll use this as an example to run some benchmark. Keeping this issue open in case others want to improve. |
Try this: #include <atomic>
#include <thread>
alignas(128) std::atomic<int> counter[1024]{};
void update(int idx) {
for (int j = 0; j < 100000000; ++j) ++counter[idx];
}
int main() {
std::thread t1(update, 0);
std::thread t2(update, 0);
std::thread t3(update, 16);
std::thread t4(update, 16);
t1.join();
t2.join();
t3.join();
t4.join();
} which comes from a question I asked on StackOverflow. |
thank you. give me a week and i'll get black to you with my results.
…On Wed, Aug 28, 2024 at 1:52 PM QuarticCat ***@***.***> wrote:
I know they go back a while, and I've never seen compelling evidence. Just
some random assertions on comments. The L2 spatial prefetcher will try to
complete pairs of lines, but it won't evict L1 data that is modified.
Try this:
#include <atomic>
#include <thread>
alignas(128) std::atomic<int> counter[1024]{};
void update(int idx) {
for (int j = 0; j < 100000000; ++j) ++counter[idx];
}
int main() {
std::thread t1(update, 0);
std::thread t2(update, 0);
std::thread t3(update, 16);
std::thread t4(update, 16);
t1.join();
t2.join();
t3.join();
t4.join();
}
which comes from a question
<https://stackoverflow.com/questions/72126606/should-the-cache-padding-size-of-x86-64-be-128-bytes>
I asked on StackOverflow.
—
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASBHXB2W3MBSIUSRNNTVY3ZTYE5RAVCNFSM6AAAAABMOU5LXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJVHE2DENJZGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I'm not sure this will particularly matter, but the ThreadPool mutex is going to be sharing a cache line with other other data. Since zig reorders fields in regular structs, its unclear what it is sharing a cache line with: could be something that matters, could be irrelevant.
The text was updated successfully, but these errors were encountered: