Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement asynchronous snapshot by copying dataTree #247

Merged
merged 6 commits into from
May 8, 2024

Conversation

lzydmxy
Copy link
Contributor

@lzydmxy lzydmxy commented May 7, 2024

Which issues of this PR fixes:

This PR is part of #144

Change log:

  • Creating snapshot asynchronously.

Motivation:

Right now keeper creates snapshot synchronously, during which period, it is unable to handle to user requests for creating snapshot is in the request handling procress.

If a cluser has large amount of data, creating snapshot will lead large blocking time. And because keeper can not handle heartbeat, leader election may happen. Both of them will make keeper unavailable.

Solution:

The main idea is a little like COW.

  1. Copy all nodes of the data tree (copying is parallel) and save them in memory temporarily. This step will block user requests and consume some memory.
  2. Serialize and persist the data, this step is in background.

Optimization result:

We have a keeper cluster which holds metadata of Clickhouse and it has 58659723 nodes.

  1. The blocking (user requests) time reduce from 180 seconds to 4.5 seconds, it is about 2.5% of the orignial.
  2. But there is also a side effect. When creating snapshot memory usage increases from 40GB to 60GB (approximately 1.5 times).

@JackyWoo JackyWoo added this to the Release v2.0.5 milestone May 7, 2024
@JackyWoo JackyWoo added the enhancement Enhancements label May 7, 2024
@JackyWoo JackyWoo self-assigned this May 7, 2024
src/Service/KeeperStore.h Outdated Show resolved Hide resolved
@JackyWoo
Copy link
Contributor

JackyWoo commented May 7, 2024

@lzydmxy nice work, please show us the memory usage when creating snapshot.

@JackyWoo
Copy link
Contributor

JackyWoo commented May 7, 2024

I will carefully review this PR tomorrow.

src/Service/NuRaftLogSnapshot.cpp Outdated Show resolved Hide resolved
src/Service/NuRaftLogSnapshot.h Show resolved Hide resolved
src/Service/KeeperStore.h Outdated Show resolved Hide resolved
src/Service/KeeperStore.h Outdated Show resolved Hide resolved
src/Service/KeeperStore.h Outdated Show resolved Hide resolved
src/Service/NuRaftLogSnapshot.h Show resolved Hide resolved
src/Service/NuRaftLogSnapshot.h Outdated Show resolved Hide resolved
src/Service/NuRaftLogSnapshot.h Outdated Show resolved Hide resolved
src/Service/NuRaftStateMachine.cpp Outdated Show resolved Hide resolved
tests/integration/test_four_word_command/test.py Outdated Show resolved Hide resolved
@JackyWoo
Copy link
Contributor

JackyWoo commented May 8, 2024

@lzydmxy LGTM except some code sytle issues.

@JackyWoo JackyWoo merged commit 031e35e into JDRaftKeeper:master May 8, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants