Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. #674

Merged
merged 9 commits into from
Jul 2, 2024

Conversation

lipzhu
Copy link
Contributor

@lipzhu lipzhu commented Jun 19, 2024

Description

This patch try to introduce a thread-local storage variable for each thread to update its own used_memory, and then sum them together when reading in zmalloc_used_memory. Then we can reduce unnecessary lock add contention from atomic variable. We also add a protection if too many threads created and the total threads number greater than 132, then fall back to atomic operation for the threads index >= 132.

Problem Statement

zmalloc and zfree related functions will update the used_memory atomicity for each operation, and they are called very frequency. From the benchmark of memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml , the cycles ratio of zmalloc and zfree are high, they are wrappers for the lower allocator library, it should not take too much cycles. And most of the cycles are contributed by lock add and lock sub , they are expensive instructions. From the profiling, the metrics' update mainly come from the main thread, use a TLS will reduce a lot of contention.

Performance Boost

Note: This optimization should benefit common benchmark widely. I choose below 2 scenarios to validate the performance boost in my local environment.

Test Suites Performance Boost
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10 8%
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 4%
Test Env
  • OS: Ubuntu 22.04.4 LTS
  • Platform: Intel Xeon Platinum 8380
  • Server and Client in same socket
Start Server
taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf
port 9001
bind * -::*
daemonize yes
protected-mode no
save ""

Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Project coverage is 70.11%. Comparing base (4d3d6c0) to head (d271af2).
Report is 15 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #674      +/-   ##
============================================
- Coverage     70.21%   70.11%   -0.11%     
============================================
  Files           110      111       +1     
  Lines         60100    60233     +133     
============================================
+ Hits          42202    42231      +29     
- Misses        17898    18002     +104     
Files Coverage Δ
src/networking.c 85.47% <ø> (+0.04%) ⬆️
src/zmalloc.c 84.58% <78.94%> (-0.47%) ⬇️

... and 17 files with indirect coverage changes

…hen update used_memory metrics.

Signed-off-by: Lipeng Zhu <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be the better alternative, I believe. It achives the same performance(?) and avoids the errors.

src/zmalloc.c Outdated Show resolved Hide resolved
src/zmalloc.c Outdated
Comment on lines 418 to 422
assert(total_active_threads < MAX_THREADS_NUM);
size_t um = 0;
for (int i = 0; i < total_active_threads; i++) {
um += used_memory_thread[i];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get this part. without atomic operations on both readers and writers, I think we will get stale data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingXie Yes, the worst case is when read thread is reading while the write thread are writing to thread-local storage variable, but the data will not lose, just stale.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. so we are essentially saying/assuming the staleness is (tightly) bounded, which I think is reasonable.

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Will sign off after you address my comments

src/zmalloc.c Outdated Show resolved Hide resolved
lipzhu and others added 3 commits June 25, 2024 10:12
Signed-off-by: Lipeng Zhu <[email protected]>
Co-authored-by: Wangyang Guo <[email protected]>
Signed-off-by: Lipeng Zhu <[email protected]>
Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead? (@uriyage, @madolson @zuiderkwast FYI)

src/os.h Outdated Show resolved Hide resolved
@zuiderkwast
Copy link
Contributor

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

@ranshid
Copy link
Member

ranshid commented Jun 25, 2024

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact.
Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.

@zuiderkwast no concrete concern at this point

src/os.h Outdated Show resolved Hide resolved
src/zmalloc.c Outdated Show resolved Hide resolved
@zuiderkwast
Copy link
Contributor

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact. Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.

@zuiderkwast no concrete concern at this point

The only overlap between these two PRs are that the CACHE_LINE_SIZE definition is moved to config.h, so it will be a small merge conflict but not a serious one. I don't think we should hold off. It slows down the project.

Adjusting the loop to only include the active threads is a minor optimization we can do later.

@ranshid
Copy link
Member

ranshid commented Jun 26, 2024

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact. Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.
@zuiderkwast no concrete concern at this point

The only overlap between these two PRs are that the CACHE_LINE_SIZE definition is moved to config.h, so it will be a small merge conflict but not a serious one. I don't think we should hold off. It slows down the project.

Adjusting the loop to only include the active threads is a minor optimization we can do later.

As I said. I did not observe any specific issue just had a general feeling of dependency. I just do not want the extra work of adjusting the Async IO PR after merging on this one, but maybe I am just paranoid.

I also have some questions/suggestions about this PR:

  1. why are we only talking about io-threads here? there are also bio threads and potentially more in the future. probably in case we know the thread limits in Valkey. Also we can maintain a "MAX_THREADS" define which I also think should be tracked via asserts so me make sure no overflow regression happen in the future.
  2. Do we have updated performance numbers of this specific change? I know in Introduce Thread-local storage variable to reduce atomic contention when update used_memory metrics. #308 we observed 6% QPS improvement, but from a quick look it seems that implementation was different. For example in this case we are using a single static array which might cause the same cache line to keep being invalidated due to multi-core cache updates. Not saying this is a "smart" observation and maybe I am wrong, but would be happy to see updated performance numbers
  3. I think this is a reoccurring pattern now that we are using threads more intensively (offloading work to io-threads). can we consider creating a standalone infra for thread-local-collect like it exists in many libraries (c is challenging in that aspect I guess)
  4. nit - I think the top comment should be filled with the relevant data and description so that this PR can stand by itself.

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 26, 2024

I think reducing atomic updates is great. I just think that we have a very big PR coming which will address similar issues and make lots of related changes. so suggest to wait for it instead?

@ranshid For zmalloc? I'm not aware of any such PR. The Async IO threading PRs don't affect zmalloc AFAIK.

I think it might be easier to consider the atomic changes of threads after the Async IO PR (Maybe?) all the part of dynamically adding and removing threads MIGHT have impact. Not sure I am right about it since I just started to review the Async IO PR. But thought it might be better to fix on top of it.
@zuiderkwast no concrete concern at this point

The only overlap between these two PRs are that the CACHE_LINE_SIZE definition is moved to config.h, so it will be a small merge conflict but not a serious one. I don't think we should hold off. It slows down the project.
Adjusting the loop to only include the active threads is a minor optimization we can do later.

As I said. I did not observe any specific issue just had a general feeling of dependency. I just do not want the extra work of adjusting the Async IO PR after merging on this one, but maybe I am just paranoid.

I also have some questions/suggestions about this PR:

  1. why are we only talking about io-threads here? there are also bio threads and potentially more in the future. probably in case we know the thread limits in Valkey. Also we can maintain a "MAX_THREADS" define which I also think should be tracked via asserts so me make sure no overflow regression happen in the future.

We have a macro #define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1) defined in os.h.

  1. Do we have updated performance numbers of this specific change? I know in Introduce Thread-local storage variable to reduce atomic contention when update used_memory metrics. #308 we observed 6% QPS improvement, but from a quick look it seems that implementation was different. For example in this case we are using a single static array which might cause the same cache line to keep being invalidated due to multi-core cache updates. Not saying this is a "smart" observation and maybe I am wrong, but would be happy to see updated performance numbers

:) The story is a bit long, the initial version of #308 is same with this patch and perf boost list in top comments is based on the initial version. But after some discussions in #467, we think could be more aggressive, then #308 evolved into what it is now. But @soloestoy have concerned about the delta proposal, as discussed with @zuiderkwast , we fall back to this thread-local array proposal in this PR to let others vote. I will update the top comments once we have decision.

  1. I think this is a reoccurring pattern now that we are using threads more intensively (offloading work to io-threads). can we consider creating a standalone infra for thread-local-collect like it exists in many libraries (c is challenging in that aspect I guess)
  2. nit - I think the top comment should be filled with the relevant data and description so that this PR can stand by itself.

Ditto.

@soloestoy
Copy link
Member

generally LGTM :)

@ranshid
Copy link
Member

ranshid commented Jun 26, 2024

We have a macro #define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1) defined in os.h.

I am sorry @lipzhu . probably missed it while I was scanning the code. this works fine together with the assert.

we fall back to this thread-local array proposal in this PR to let others vote. I will update the top comments once we have decision.

By "thread-local array" you mean used_memory_threa? I did not see how this is a thread local, which is what I meant before by having multiple threads writing to the same memory area. Anyway I guess if you benchmarked it and observed a performance gain, no reason to hold you back.

Thank you for this contribution!

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your contribution, @lipzhu! I am happy with the changes :-).

I echo @ranshid's earlier suggestion of including the performance number at the top of this issue too. I assume we will get the same 6% QPS boost, which is definitely worth mentioning in the release notes.

@PingXie PingXie added the release-notes This issue should get a line item in the release notes label Jun 27, 2024
Signed-off-by: Lipeng Zhu <[email protected]>
@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 27, 2024

Thanks again for your contribution, @lipzhu! I am happy with the changes :-).

I echo @ranshid's earlier suggestion of including the performance number at the top of this issue too. I assume we will get the same 6% QPS boost, which is definitely worth mentioning in the release notes.

@PingXie I just retested this patch in my local environment and updated the top comments with performance boost info.

Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code LGTM, thanks. The comments and discussions were helpful.

@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

@PingXie I just retested this patch in my local environment and updated the top comments with performance boost info.

Thanks @lipzhu!

Curious. Was memtier run on the same machine or over the network? can you try a random access non-pipelined workload with a 8:2 or 7:3 get/set ratio? Please set the value size to 1024 and pre-load the keys so that there are no cache misses. I don't know what client count was used in your earlier tests but hopefully significant enough to keep the server CPUs busy.

At some point, we will need to get real with #368 and standardize on our performance benchmarks ...

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 27, 2024

@PingXie I just retested this patch in my local environment and updated the top comments with performance boost info.

Thanks @lipzhu!

Curious. Was memtier run on the same machine or over the network?

@PingXie memtier and server are in the same machine over the loopback device.

can you try a random access non-pipelined workload with a 8:2 or 7:3 get/set ratio? Please set the value size to 1024 and pre-load the keys so that there are no cache misses.

I use below memtier commands according to your requirements. Didn't observe an oblivious performance boost(~1%), consider the variance, the performance should be same. And from the baseline's profiling result, the performance result is sensible, because the zmalloc/zfree cycles ratio is low, there is not too much room for this test scenario.

taskset -c 4-7 memtier_benchmark -s 127.0.0.1 -p 9001 "--data-size" "1024" --ratio 3:7 --key-pattern R:R --key-minimum=1 --key-maximum 1000000 --test-time 60 -c 50 -t 4 --hide-histogram

image

I don't know what client count was used in your earlier tests but hopefully significant enough to keep the server CPUs busy.

We always make server CPU utilization 100% when setup benchmark clients.

@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

Thanks for the additional context.

memtier and server are in the same machine over the loopback device.

This is a very reasonable setup to eliminate the network variance but I think it does mean that the performance boost is amplified compared to a typical setup because network latency plays a significant role too.

because the zmalloc/zfree cycles ratio is low, there is not too much room for this test scenario.

I am not sure how to interpret this. I was under the impression that the zmalloc/zfree count would be proportional to the number of requests, which is both 1M in this test and your earlier tests. I now wonder if pipelining (with a queue depth of 10) further squeezes out other overhead (such as the read/write syscalls) thus again "amplifying" the performance boost? In other words, I think the memory usage tracking overhead or the "numerator" is roughly the same in both cases but with pipelining the total cost or the "denominator" is smaller thanks to the syscall reduction?

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 27, 2024

This is a very reasonable setup to eliminate the network variance but I think it does mean that the performance boost is amplified compared to a typical setup because network latency plays a significant role too.

I don't have high speed network on my hands now, but I guess the behavior should be similar if network is not a bottleneck. Maybe we can standardize the performance testing environment later.

I now wonder if pipelining (with a queue depth of 10) further squeezes out other overhead (such as the read/write syscalls) thus again "amplifying" the performance boost? In other words, I think the memory usage tracking overhead or the "numerator" is roughly the same in both cases but with pipelining the total cost or the "denominator" is smaller thanks to the syscall reduction?

@PingXie Yes, you are right. This kind of optimization strategy can be interpreted easily in the single thread module(sequential). Pipeline reduce the SYSCALL of read/write socket, and we have more CPU resources to handle the commands and then increase the call times of zmalloc/zfree related instructions and total cycles ratio increased correspondingly. Then our optimization of using add to replace lock add instruction works. Imagine if total cycles ratio produced by an instruction is less than 1%, even we reduced the whole instruction, it didn't benefit the overall performance (maybe 1%?).

static _Atomic size_t used_memory = 0;
/* Register the thread index in start_routine. */
static inline void zmalloc_register_thread_index(void) {
thread_index = atomic_fetch_add_explicit(&total_active_threads, 1, memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming threads are lunch and never terminate and relaunched? cause otherwise this can easily reach max over time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's not very good to assume that.

I wonder if we can let this index overflow to zero. Worst case, if two threads use the same index, the implementation is still correct, right?

If we can do that, then zmalloc doesn't need to know the server's exact max number of threads. I prefer if we can avoid that coupling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's not very good to assume that.

+1.

I don't think the engine dynamically creates/destroys threads today. that said, agreed we could start doing that in the future so there is indeed a risk. moreover, I wonder if this assumption could be broken already by a module allocating/freeing memory from dynamically created and short-lived threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about HelloBlock_ValkeyCommand [didn't look at the logic of it TBH]? In general what about modules?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, my own preference is to wrap the thread create function and plug any initialization we want to it. I can imagine other usages for that, I guess you discussed this and I'm late to the party

Copy link
Contributor

@zuiderkwast zuiderkwast Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand now why two threads cant add to the same variable: add is not atomic. It is read-modify-write. A simple read or write to one word is atomic, so that's why we can avoid _Atomic when we have only one writer. (@lipzhu explained to me in a chat.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea, a simple protection: If thread index > MAX_THREADS_NUM, just use one special shared atomic variable for those threads, something like _Atomic size_t used_memory_for_additional_threads. These last threads will not benefit from the optimization, but it's a protection. It's used only if some modules create a lot of threads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in case a module or a future dynamic io-threads implementation which spawns and kills threads all the time will converge to use atomic? We will probably need to address this limitation in our future work then.
I would like to ask why is that so bad to keep a "free-ids-list" and use a mutex to guard it when we allocate thread id? It is a once in a thread lifecycle operation so basically I am not sure will cause that much performance impact.
For the deregistration when a thread is destroyed. I would suggest think of using pthread_key_t so we can assign a destructor call to return the id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the module threads will use atomic in that case, but the main thread and IO threads use their own variable if they're spawned first and stay alive. We can improve this when/if we see a problem with modules. (Lipeng's idea is to keep a bitmap to flag each entry in the array as used or free.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. server threads are created earlier and long-lived. I don't think they will ever use the overflow atomic counter.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy now, with this protection against too many threads.

I just found some typos.

src/zmalloc.c Outdated Show resolved Hide resolved
src/zmalloc.c Outdated Show resolved Hide resolved
src/zmalloc.c Outdated Show resolved Hide resolved
src/zmalloc.c Outdated Show resolved Hide resolved
src/config.h Outdated
Comment on lines 332 to 333
#define IO_THREADS_MAX_NUM 128
#define MAX_THREADS_NUM (IO_THREADS_MAX_NUM + 3 + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor) CACHE_LINE_SIZE is good in this file, but IO_THREADS_MAX_NUM is a little misplaced here, IMHO.

Now that zmalloc has a protection against too many threads, can we just define a limit within zmalloc.c that is independent of IO threads? That's my preference, but I will not insist if others disagree.

Signed-off-by: Lipeng Zhu <[email protected]>
@zvi-code
Copy link
Contributor

zvi-code commented Jul 1, 2024

Description

This patch try to introduce Thread-local storage variable to replace atomic for zmalloc to reduce unnecessary contention.

Problem Statement

zmalloc and zfree related functions will update the used_memory atomicity for each operation, and they are called very frequency. From the benchmark of memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml , the cycles ratio of zmalloc and zfree are high, they are wrappers for the lower allocator library, it should not take too much cycles. And most of the cycles are contributed by lock add and lock sub , they are expensive instructions. From the profiling, the metrics' update mainly come from the main thread, use a TLS will reduce a lot of contention.

Performance Boost

Note: This optimization should benefit common benchmark widely. I choose below 2 scenarios to validate the performance boost in my local environment.

Test Suites Performance Boost
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10 8%
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 4%

Test Env
  • OS: Ubuntu 22.04.4 LTS
  • Platform: Intel Xeon Platinum 8380
  • Server and Client in same socket
Start Server
taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf
port 9001
bind * -::*
daemonize yes
protected-mode no
save ""

@lipzhu, sound like testing with threads is critical to validate the change, can you share results when tested with io-threads?

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, LGTM.

A majority of the core team has approved or said they prefer this solution over #308. @lipzhu can you update the PR title and description?

@lipzhu lipzhu changed the title Use thread-local storage variable to reduce atomic contention when update used_memory metrics. Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention when update used_memory metrics. Jul 2, 2024
@lipzhu
Copy link
Contributor Author

lipzhu commented Jul 2, 2024

@lipzhu, sound like testing with threads is critical to validate the change, can you share results when tested with io-threads?

@zvi-code I don't think its critical for threads feature, even in a single thread, the atomic contention still exist and instruction like lock add is expensive.
Anyway, I also tested with io-threads 4 configured, the performance boost list as below.

Performance Boost
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10 4.5%
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 3.2%

@lipzhu lipzhu changed the title Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention when update used_memory metrics. Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. Jul 2, 2024
@lipzhu
Copy link
Contributor Author

lipzhu commented Jul 2, 2024

OK, LGTM.

A majority of the core team has approved or said they prefer this solution over #308. @lipzhu can you update the PR title and description?

@zuiderkwast Thanks, update the title and description, PTAL.

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks @lipzhu!

@PingXie PingXie merged commit 3323e42 into valkey-io:unstable Jul 2, 2024
19 checks passed
@lipzhu lipzhu deleted the atomic2 branch July 2, 2024 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants