Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. #674

lipzhu · 2024-06-19T12:40:22Z

Description

This patch try to introduce a thread-local storage variable for each thread to update its own used_memory, and then sum them together when reading in zmalloc_used_memory. Then we can reduce unnecessary lock add contention from atomic variable. We also add a protection if too many threads created and the total threads number greater than 132, then fall back to atomic operation for the threads index >= 132.

Problem Statement

zmalloc and zfree related functions will update the used_memory atomicity for each operation, and they are called very frequency. From the benchmark of memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml , the cycles ratio of zmalloc and zfree are high, they are wrappers for the lower allocator library, it should not take too much cycles. And most of the cycles are contributed by lock add and lock sub , they are expensive instructions. From the profiling, the metrics' update mainly come from the main thread, use a TLS will reduce a lot of contention.

Performance Boost

Note: This optimization should benefit common benchmark widely. I choose below 2 scenarios to validate the performance boost in my local environment.

Test Suites	Performance Boost
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10	8%
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10	4%

Test Env

OS: Ubuntu 22.04.4 LTS
Platform: Intel Xeon Platinum 8380
Server and Client in same socket

Start Server

taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf
port 9001
bind * -::*
daemonize yes
protected-mode no
save ""

codecov · 2024-06-19T12:53:01Z

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Project coverage is 70.11%. Comparing base (4d3d6c0) to head (d271af2).
Report is 15 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #674      +/-   ##
============================================
- Coverage     70.21%   70.11%   -0.11%     
============================================
  Files           110      111       +1     
  Lines         60100    60233     +133     
============================================
+ Hits          42202    42231      +29     
- Misses        17898    18002     +104

Files	Coverage Δ
src/networking.c	`85.47% <ø> (+0.04%)`	⬆️
src/zmalloc.c	`84.58% <78.94%> (-0.47%)`	⬇️

... and 17 files with indirect coverage changes

…hen update used_memory metrics. Signed-off-by: Lipeng Zhu <[email protected]>

zuiderkwast

This may be the better alternative, I believe. It achives the same performance(?) and avoids the errors.

src/zmalloc.c

PingXie · 2024-06-21T06:45:55Z

src/zmalloc.c

+    assert(total_active_threads < MAX_THREADS_NUM);
+    size_t um = 0;
+    for (int i = 0; i < total_active_threads; i++) {
+        um += used_memory_thread[i];
+    }


I am not sure I get this part. without atomic operations on both readers and writers, I think we will get stale data?

@PingXie Yes, the worst case is when read thread is reading while the write thread are writing to thread-local storage variable, but the data will not lose, just stale.

That makes sense. so we are essentially saying/assuming the staleness is (tightly) bounded, which I think is reasonable.

PingXie

LGTM overall. Will sign off after you address my comments

src/zmalloc.c

Signed-off-by: Lipeng Zhu <[email protected]> Co-authored-by: Wangyang Guo <[email protected]>

Signed-off-by: Lipeng Zhu <[email protected]>

ranshid

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead? (@uriyage, @madolson @zuiderkwast FYI)

src/os.h

zuiderkwast · 2024-06-25T12:30:12Z

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

ranshid · 2024-06-25T13:18:55Z

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact.
Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.

@zuiderkwast no concrete concern at this point

src/os.h

src/zmalloc.c

zuiderkwast · 2024-06-25T16:25:17Z

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact. Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.

@zuiderkwast no concrete concern at this point

The only overlap between these two PRs are that the CACHE_LINE_SIZE definition is moved to config.h, so it will be a small merge conflict but not a serious one. I don't think we should hold off. It slows down the project.

Adjusting the loop to only include the active threads is a minor optimization we can do later.

ranshid · 2024-06-26T05:10:32Z

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact. Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.
@zuiderkwast no concrete concern at this point

The only overlap between these two PRs are that the CACHE_LINE_SIZE definition is moved to config.h, so it will be a small merge conflict but not a serious one. I don't think we should hold off. It slows down the project.

Adjusting the loop to only include the active threads is a minor optimization we can do later.

As I said. I did not observe any specific issue just had a general feeling of dependency. I just do not want the extra work of adjusting the Async IO PR after merging on this one, but maybe I am just paranoid.

I also have some questions/suggestions about this PR:

why are we only talking about io-threads here? there are also bio threads and potentially more in the future. probably in case we know the thread limits in Valkey. Also we can maintain a "MAX_THREADS" define which I also think should be tracked via asserts so me make sure no overflow regression happen in the future.
Do we have updated performance numbers of this specific change? I know in Introduce Thread-local storage variable to reduce atomic contention when update used_memory metrics. #308 we observed 6% QPS improvement, but from a quick look it seems that implementation was different. For example in this case we are using a single static array which might cause the same cache line to keep being invalidated due to multi-core cache updates. Not saying this is a "smart" observation and maybe I am wrong, but would be happy to see updated performance numbers
I think this is a reoccurring pattern now that we are using threads more intensively (offloading work to io-threads). can we consider creating a standalone infra for thread-local-collect like it exists in many libraries (c is challenging in that aspect I guess)
nit - I think the top comment should be filled with the relevant data and description so that this PR can stand by itself.

lipzhu · 2024-06-26T06:38:36Z

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact. Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.
@zuiderkwast no concrete concern at this point

The only overlap between these two PRs are that the CACHE_LINE_SIZE definition is moved to config.h, so it will be a small merge conflict but not a serious one. I don't think we should hold off. It slows down the project.
Adjusting the loop to only include the active threads is a minor optimization we can do later.

As I said. I did not observe any specific issue just had a general feeling of dependency. I just do not want the extra work of adjusting the Async IO PR after merging on this one, but maybe I am just paranoid.

I also have some questions/suggestions about this PR:

why are we only talking about io-threads here? there are also bio threads and potentially more in the future. probably in case we know the thread limits in Valkey. Also we can maintain a "MAX_THREADS" define which I also think should be tracked via asserts so me make sure no overflow regression happen in the future.

We have a macro #define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1) defined in os.h.

Do we have updated performance numbers of this specific change? I know in Introduce Thread-local storage variable to reduce atomic contention when update used_memory metrics. #308 we observed 6% QPS improvement, but from a quick look it seems that implementation was different. For example in this case we are using a single static array which might cause the same cache line to keep being invalidated due to multi-core cache updates. Not saying this is a "smart" observation and maybe I am wrong, but would be happy to see updated performance numbers

:) The story is a bit long, the initial version of #308 is same with this patch and perf boost list in top comments is based on the initial version. But after some discussions in #467, we think could be more aggressive, then #308 evolved into what it is now. But @soloestoy have concerned about the delta proposal, as discussed with @zuiderkwast , we fall back to this thread-local array proposal in this PR to let others vote. I will update the top comments once we have decision.

I think this is a reoccurring pattern now that we are using threads more intensively (offloading work to io-threads). can we consider creating a standalone infra for thread-local-collect like it exists in many libraries (c is challenging in that aspect I guess)

nit - I think the top comment should be filled with the relevant data and description so that this PR can stand by itself.

Ditto.

soloestoy · 2024-06-26T12:20:47Z

generally LGTM :)

ranshid · 2024-06-26T15:42:55Z

We have a macro #define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1) defined in os.h.

I am sorry @lipzhu . probably missed it while I was scanning the code. this works fine together with the assert.

we fall back to this thread-local array proposal in this PR to let others vote. I will update the top comments once we have decision.

By "thread-local array" you mean used_memory_threa? I did not see how this is a thread local, which is what I meant before by having multiple threads writing to the same memory area. Anyway I guess if you benchmarked it and observed a performance gain, no reason to hold you back.

Thank you for this contribution!

Signed-off-by: Lipeng Zhu <[email protected]>

src/zmalloc.c

PingXie

Thanks again for your contribution, @lipzhu! I am happy with the changes :-).

I echo @ranshid's earlier suggestion of including the performance number at the top of this issue too. I assume we will get the same 6% QPS boost, which is definitely worth mentioning in the release notes.

Signed-off-by: Lipeng Zhu <[email protected]>

lipzhu · 2024-06-27T03:06:21Z

Thanks again for your contribution, @lipzhu! I am happy with the changes :-).

I echo @ranshid's earlier suggestion of including the performance number at the top of this issue too. I assume we will get the same 6% QPS boost, which is definitely worth mentioning in the release notes.

@PingXie I just retested this patch in my local environment and updated the top comments with performance boost info.

enjoy-binbin

The new code LGTM, thanks. The comments and discussions were helpful.

PingXie · 2024-06-27T05:05:11Z

@PingXie I just retested this patch in my local environment and updated the top comments with performance boost info.

Thanks @lipzhu!

Curious. Was memtier run on the same machine or over the network? can you try a random access non-pipelined workload with a 8:2 or 7:3 get/set ratio? Please set the value size to 1024 and pre-load the keys so that there are no cache misses. I don't know what client count was used in your earlier tests but hopefully significant enough to keep the server CPUs busy.

At some point, we will need to get real with #368 and standardize on our performance benchmarks ...

lipzhu · 2024-06-27T06:08:20Z

@PingXie I just retested this patch in my local environment and updated the top comments with performance boost info.

Thanks @lipzhu!

Curious. Was memtier run on the same machine or over the network?

@PingXie memtier and server are in the same machine over the loopback device.

can you try a random access non-pipelined workload with a 8:2 or 7:3 get/set ratio? Please set the value size to 1024 and pre-load the keys so that there are no cache misses.

I use below memtier commands according to your requirements. Didn't observe an oblivious performance boost(~1%), consider the variance, the performance should be same. And from the baseline's profiling result, the performance result is sensible, because the zmalloc/zfree cycles ratio is low, there is not too much room for this test scenario.

taskset -c 4-7 memtier_benchmark -s 127.0.0.1 -p 9001 "--data-size" "1024" --ratio 3:7 --key-pattern R:R --key-minimum=1 --key-maximum 1000000 --test-time 60 -c 50 -t 4 --hide-histogram

I don't know what client count was used in your earlier tests but hopefully significant enough to keep the server CPUs busy.

We always make server CPU utilization 100% when setup benchmark clients.

PingXie · 2024-06-27T07:03:27Z

Thanks for the additional context.

memtier and server are in the same machine over the loopback device.

This is a very reasonable setup to eliminate the network variance but I think it does mean that the performance boost is amplified compared to a typical setup because network latency plays a significant role too.

because the zmalloc/zfree cycles ratio is low, there is not too much room for this test scenario.

I am not sure how to interpret this. I was under the impression that the zmalloc/zfree count would be proportional to the number of requests, which is both 1M in this test and your earlier tests. I now wonder if pipelining (with a queue depth of 10) further squeezes out other overhead (such as the read/write syscalls) thus again "amplifying" the performance boost? In other words, I think the memory usage tracking overhead or the "numerator" is roughly the same in both cases but with pipelining the total cost or the "denominator" is smaller thanks to the syscall reduction?

lipzhu · 2024-06-27T07:53:57Z

This is a very reasonable setup to eliminate the network variance but I think it does mean that the performance boost is amplified compared to a typical setup because network latency plays a significant role too.

I don't have high speed network on my hands now, but I guess the behavior should be similar if network is not a bottleneck. Maybe we can standardize the performance testing environment later.

I now wonder if pipelining (with a queue depth of 10) further squeezes out other overhead (such as the read/write syscalls) thus again "amplifying" the performance boost? In other words, I think the memory usage tracking overhead or the "numerator" is roughly the same in both cases but with pipelining the total cost or the "denominator" is smaller thanks to the syscall reduction?

@PingXie Yes, you are right. This kind of optimization strategy can be interpreted easily in the single thread module(sequential). Pipeline reduce the SYSCALL of read/write socket, and we have more CPU resources to handle the commands and then increase the call times of zmalloc/zfree related instructions and total cycles ratio increased correspondingly. Then our optimization of using add to replace lock add instruction works. Imagine if total cycles ratio produced by an instruction is less than 1%, even we reduced the whole instruction, it didn't benefit the overall performance (maybe 1%?).

zvi-code · 2024-06-27T13:27:48Z

src/zmalloc.c

-static _Atomic size_t used_memory = 0;
+/* Register the thread index in start_routine. */
+static inline void zmalloc_register_thread_index(void) {
+    thread_index = atomic_fetch_add_explicit(&total_active_threads, 1, memory_order_relaxed);


Are you assuming threads are lunch and never terminate and relaunched? cause otherwise this can easily reach max over time

Good point. It's not very good to assume that.

I wonder if we can let this index overflow to zero. Worst case, if two threads use the same index, the implementation is still correct, right?

If we can do that, then zmalloc doesn't need to know the server's exact max number of threads. I prefer if we can avoid that coupling.

Good point. It's not very good to assume that.

+1.

I don't think the engine dynamically creates/destroys threads today. that said, agreed we could start doing that in the future so there is indeed a risk. moreover, I wonder if this assumption could be broken already by a module allocating/freeing memory from dynamically created and short-lived threads.

what about HelloBlock_ValkeyCommand [didn't look at the logic of it TBH]? In general what about modules?

BTW, my own preference is to wrap the thread create function and plug any initialization we want to it. I can imagine other usages for that, I guess you discussed this and I'm late to the party

OK, I understand now why two threads cant add to the same variable: add is not atomic. It is read-modify-write. A simple read or write to one word is atomic, so that's why we can avoid _Atomic when we have only one writer. (@lipzhu explained to me in a chat.)

Another idea, a simple protection: If thread index > MAX_THREADS_NUM, just use one special shared atomic variable for those threads, something like _Atomic size_t used_memory_for_additional_threads. These last threads will not benefit from the optimization, but it's a protection. It's used only if some modules create a lot of threads.

So in case a module or a future dynamic io-threads implementation which spawns and kills threads all the time will converge to use atomic? We will probably need to address this limitation in our future work then.
I would like to ask why is that so bad to keep a "free-ids-list" and use a mutex to guard it when we allocate thread id? It is a once in a thread lifecycle operation so basically I am not sure will cause that much performance impact.
For the deregistration when a thread is destroyed. I would suggest think of using pthread_key_t so we can assign a destructor call to return the id.

Yes, the module threads will use atomic in that case, but the main thread and IO threads use their own variable if they're spawned first and stay alive. We can improve this when/if we see a problem with modules. (Lipeng's idea is to keep a bitmap to flag each entry in the array as used or free.)

I think this is fine. server threads are created earlier and long-lived. I don't think they will ever use the overflow atomic counter.

Signed-off-by: Lipeng Zhu <[email protected]>

zuiderkwast

I'm happy now, with this protection against too many threads.

I just found some typos.

src/zmalloc.c

zuiderkwast · 2024-06-28T17:32:33Z

src/config.h

+#define IO_THREADS_MAX_NUM 128
+#define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1)


(Minor) CACHE_LINE_SIZE is good in this file, but IO_THREADS_MAX_NUM is a little misplaced here, IMHO.

Now that zmalloc has a protection against too many threads, can we just define a limit within zmalloc.c that is independent of IO threads? That's my preference, but I will not insist if others disagree.

Signed-off-by: Lipeng Zhu <[email protected]>

zvi-code · 2024-07-01T06:34:40Z

Description

This patch try to introduce Thread-local storage variable to replace atomic for zmalloc to reduce unnecessary contention.

Problem Statement

zmalloc and zfree related functions will update the used_memory atomicity for each operation, and they are called very frequency. From the benchmark of memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml , the cycles ratio of zmalloc and zfree are high, they are wrappers for the lower allocator library, it should not take too much cycles. And most of the cycles are contributed by lock add and lock sub , they are expensive instructions. From the profiling, the metrics' update mainly come from the main thread, use a TLS will reduce a lot of contention.

Performance Boost

Note: This optimization should benefit common benchmark widely. I choose below 2 scenarios to validate the performance boost in my local environment.

Test Suites Performance Boost
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10 8%
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 4%

Test Env

OS: Ubuntu 22.04.4 LTS

Platform: Intel Xeon Platinum 8380

Server and Client in same socket

Start Server
taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf
port 9001
bind * -::*
daemonize yes
protected-mode no
save ""

@lipzhu, sound like testing with threads is critical to validate the change, can you share results when tested with io-threads?

zuiderkwast

OK, LGTM.

A majority of the core team has approved or said they prefer this solution over #308. @lipzhu can you update the PR title and description?

lipzhu · 2024-07-02T04:13:41Z

@lipzhu, sound like testing with threads is critical to validate the change, can you share results when tested with io-threads?

@zvi-code I don't think its critical for threads feature, even in a single thread, the atomic contention still exist and instruction like lock add is expensive.
Anyway, I also tested with io-threads 4 configured, the performance boost list as below.

	Performance Boost
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10	4.5%
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10	3.2%

lipzhu · 2024-07-02T04:14:58Z

OK, LGTM.

A majority of the core team has approved or said they prefer this solution over #308. @lipzhu can you update the PR title and description?

@zuiderkwast Thanks, update the title and description, PTAL.

PingXie

This looks great. Thanks @lipzhu!

lipzhu mentioned this pull request Jun 19, 2024

Introduce Thread-local storage variable to reduce atomic contention when update used_memory metrics. #308

Closed

Introduce Thread-local storage variable to reduce atomic contention w…

a6de841

…hen update used_memory metrics. Signed-off-by: Lipeng Zhu <[email protected]>

zuiderkwast reviewed Jun 21, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

PingXie reviewed Jun 21, 2024

View reviewed changes

PingXie reviewed Jun 25, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

lipzhu and others added 3 commits June 25, 2024 10:12

code enhancement.

45fa84b

Signed-off-by: Lipeng Zhu <[email protected]> Co-authored-by: Wangyang Guo <[email protected]>

merge unstable

5da83c5

Signed-off-by: Lipeng Zhu <[email protected]>

add no newline at end of file.

a2a1251

Signed-off-by: Lipeng Zhu <[email protected]>

ranshid reviewed Jun 25, 2024

View reviewed changes

src/os.h Outdated Show resolved Hide resolved

ranshid reviewed Jun 25, 2024

View reviewed changes

src/os.h Outdated Show resolved Hide resolved

ranshid reviewed Jun 25, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

delete os.h and move to config.h.

cbccc67

Signed-off-by: Lipeng Zhu <[email protected]>

zuiderkwast reviewed Jun 27, 2024

View reviewed changes

src/zmalloc.c Show resolved Hide resolved

PingXie approved these changes Jun 27, 2024

View reviewed changes

PingXie added the release-notes This issue should get a line item in the release notes label Jun 27, 2024

add comments.

41bc3c7

Signed-off-by: Lipeng Zhu <[email protected]>

enjoy-binbin approved these changes Jun 27, 2024

View reviewed changes

zvi-code reviewed Jun 27, 2024

View reviewed changes

lipzhu added 2 commits June 28, 2024 15:13

add protection to avoid too many threads created.

ef54928

Signed-off-by: Lipeng Zhu <[email protected]>

bug fix.

0beccac

Signed-off-by: Lipeng Zhu <[email protected]>

zuiderkwast reviewed Jun 28, 2024

View reviewed changes

bug fix.

d271af2

Signed-off-by: Lipeng Zhu <[email protected]>

zuiderkwast approved these changes Jul 1, 2024

View reviewed changes

lipzhu changed the title ~~Use thread-local storage variable to reduce atomic contention when update used_memory metrics.~~ Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention when update used_memory metrics. Jul 2, 2024

PingXie approved these changes Jul 2, 2024

View reviewed changes

PingXie merged commit 3323e42 into valkey-io:unstable Jul 2, 2024
19 checks passed

lipzhu deleted the atomic2 branch July 2, 2024 05:20

guowangy mentioned this pull request Jul 24, 2024

Avoid used_memory contention when update from multiple threads. redis/redis#13431

Merged

1 task

		#define IO_THREADS_MAX_NUM 128
		#define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1)

Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. #674

Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. #674

Conversation

lipzhu commented Jun 19, 2024 • edited Loading

Description

Problem Statement

Performance Boost

Test Env

Start Server

codecov bot commented Jun 19, 2024 • edited Loading

Codecov Report

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PingXie left a comment

Choose a reason for hiding this comment

ranshid left a comment • edited Loading

Choose a reason for hiding this comment

zuiderkwast commented Jun 25, 2024

ranshid commented Jun 25, 2024 • edited Loading

zuiderkwast commented Jun 25, 2024

ranshid commented Jun 26, 2024 • edited Loading

lipzhu commented Jun 26, 2024

soloestoy commented Jun 26, 2024

ranshid commented Jun 26, 2024

PingXie left a comment

Choose a reason for hiding this comment

lipzhu commented Jun 27, 2024

enjoy-binbin left a comment

Choose a reason for hiding this comment

PingXie commented Jun 27, 2024

lipzhu commented Jun 27, 2024 • edited Loading

PingXie commented Jun 27, 2024

lipzhu commented Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zvi-code commented Jul 1, 2024

Description

Problem Statement

Performance Boost

Test Env

Start Server

zuiderkwast left a comment

Choose a reason for hiding this comment

lipzhu commented Jul 2, 2024

lipzhu commented Jul 2, 2024

PingXie left a comment

Choose a reason for hiding this comment

lipzhu commented Jun 19, 2024 •

edited

Loading

codecov bot commented Jun 19, 2024 •

edited

Loading

ranshid left a comment •

edited

Loading

ranshid commented Jun 25, 2024 •

edited

Loading

ranshid commented Jun 26, 2024 •

edited

Loading

lipzhu commented Jun 27, 2024 •

edited

Loading

lipzhu commented Jun 27, 2024 •

edited

Loading

zuiderkwast Jun 28, 2024 •

edited

Loading