-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. #674
Changes from 6 commits
a6de841
45fa84b
5da83c5
a2a1251
cbccc67
41bc3c7
ef54928
0beccac
d271af2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,10 +88,44 @@ void zlibc_free(void *ptr) { | |
#define dallocx(ptr, flags) je_dallocx(ptr, flags) | ||
#endif | ||
|
||
#define update_zmalloc_stat_alloc(__n) atomic_fetch_add_explicit(&used_memory, (__n), memory_order_relaxed) | ||
#define update_zmalloc_stat_free(__n) atomic_fetch_sub_explicit(&used_memory, (__n), memory_order_relaxed) | ||
#if __STDC_NO_THREADS__ | ||
#define thread_local __thread | ||
#else | ||
#include <threads.h> | ||
#endif | ||
|
||
/* A thread-local storage which keep the current thread's index in the used_memory_thread array. */ | ||
static thread_local int thread_index = -1; | ||
/* Element in used_memory_thread array should only be written by a single thread which | ||
* distinguished by the thread-local storage thread_index. But when an element in | ||
* used_memory_thread array was written, it could be read by another thread simultaneously, | ||
* the reader will see the inconsistency memory on non x86 architecture potentially. | ||
* For the ARM and PowerPC platform, we can solve this issue by make the memory aligned. | ||
* For the other architecture, lets fall back to the atomic operation to keep safe. */ | ||
#if defined(__i386__) || defined(__x86_64__) || defined(__amd64__) || defined(__POWERPC__) || defined(__arm__) || \ | ||
defined(__arm64__) | ||
static __attribute__((aligned(sizeof(size_t)))) size_t used_memory_thread[MAX_THREADS_NUM]; | ||
#else | ||
static _Atomic size_t used_memory_thread[MAX_THREADS_NUM]; | ||
zuiderkwast marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#endif | ||
static atomic_int total_active_threads; | ||
|
||
static _Atomic size_t used_memory = 0; | ||
/* Register the thread index in start_routine. */ | ||
static inline void zmalloc_register_thread_index(void) { | ||
thread_index = atomic_fetch_add_explicit(&total_active_threads, 1, memory_order_relaxed); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you assuming threads are lunch and never terminate and relaunched? cause otherwise this can easily reach max over time There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. It's not very good to assume that. I wonder if we can let this index overflow to zero. Worst case, if two threads use the same index, the implementation is still correct, right? If we can do that, then zmalloc doesn't need to know the server's exact max number of threads. I prefer if we can avoid that coupling. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1. I don't think the engine dynamically creates/destroys threads today. that said, agreed we could start doing that in the future so there is indeed a risk. moreover, I wonder if this assumption could be broken already by a module allocating/freeing memory from dynamically created and short-lived threads. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, my own preference is to wrap the thread create function and plug any initialization we want to it. I can imagine other usages for that, I guess you discussed this and I'm late to the party There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I understand now why two threads cant There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another idea, a simple protection: If thread index > MAX_THREADS_NUM, just use one special shared atomic variable for those threads, something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So in case a module or a future dynamic io-threads implementation which spawns and kills threads all the time will converge to use atomic? We will probably need to address this limitation in our future work then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the module threads will use atomic in that case, but the main thread and IO threads use their own variable if they're spawned first and stay alive. We can improve this when/if we see a problem with modules. (Lipeng's idea is to keep a bitmap to flag each entry in the array as used or free.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is fine. server threads are created earlier and long-lived. I don't think they will ever use the overflow atomic counter. |
||
/* TODO: Handle the case when exceed the MAX_THREADS_NUM (may rarely happen). */ | ||
assert(total_active_threads < MAX_THREADS_NUM); | ||
} | ||
|
||
static inline void update_zmalloc_stat_alloc(size_t size) { | ||
if (unlikely(thread_index == -1)) zmalloc_register_thread_index(); | ||
used_memory_thread[thread_index] += size; | ||
} | ||
|
||
static inline void update_zmalloc_stat_free(size_t size) { | ||
if (unlikely(thread_index == -1)) zmalloc_register_thread_index(); | ||
used_memory_thread[thread_index] -= size; | ||
} | ||
|
||
static void zmalloc_default_oom(size_t size) { | ||
fprintf(stderr, "zmalloc: Out of memory trying to allocate %zu bytes\n", size); | ||
|
@@ -415,7 +449,11 @@ char *zstrdup(const char *s) { | |
} | ||
|
||
size_t zmalloc_used_memory(void) { | ||
size_t um = atomic_load_explicit(&used_memory, memory_order_relaxed); | ||
assert(total_active_threads < MAX_THREADS_NUM); | ||
size_t um = 0; | ||
for (int i = 0; i < total_active_threads; i++) { | ||
um += used_memory_thread[i]; | ||
zuiderkwast marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I get this part. without atomic operations on both readers and writers, I think we will get stale data? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @PingXie Yes, the worst case is when read thread is reading while the write thread are writing to thread-local storage variable, but the data will not lose, just stale. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense. so we are essentially saying/assuming the staleness is (tightly) bounded, which I think is reasonable. |
||
return um; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor) CACHE_LINE_SIZE is good in this file, but IO_THREADS_MAX_NUM is a little misplaced here, IMHO.
Now that zmalloc has a protection against too many threads, can we just define a limit within zmalloc.c that is independent of IO threads? That's my preference, but I will not insist if others disagree.