Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: unregistering topic translates to multiple redundant Raft proposals #979

Conversation

kakao-mark-yun
Copy link
Contributor

What this PR does

The current implementation of topic unregistration relies on proposeWithGuarantee's retry mechanism when deleting topics with multiple log streams. When a topic containing multiple log streams is unregistered, the control flow returns prematurely after deleting each log stream. This premature return prevents cacheCompleteCB from being called, which proposeWithGuarantee interprets as an incomplete operation that needs to be retried causing the unregistration time to scale linearly with the number of log streams (approximately 100ms per stream) and generates unnecessary consensus overhead.

The fix consolidates all log stream unregistrations into a single Raft proposal by removing the premature return.

Anything else

With topics containing over 1 billion log streams, state machine operations could still take longer than 100ms, potentially triggering the same retry behavior addressed in this fix. However, this would not affect correctness as the retry mechanism would continue to function as intended. At such scale, other system constraints would likely manifest before this particular issue would resurface.

Copy link

codecov bot commented Feb 1, 2025

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.25%. Comparing base (cc75160) to head (16093f0).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/metarepos/raft_metadata_repository.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #979      +/-   ##
==========================================
+ Coverage   80.21%   80.25%   +0.04%     
==========================================
  Files         179      179              
  Lines       21550    21548       -2     
==========================================
+ Hits        17286    17294       +8     
  Misses       3479     3479              
+ Partials      785      775      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

During topic unregistration, an early return after each log stream
unregistration prevented cacheCompleteCB from being called. Without this
callback, proposeWithGuarantee interpreted the operation as incomplete
and retried the entire topic unregistration multiple times. This resulted
in linear scaling of unregistration time with the number of log streams
(approximately 100ms per log stream) and additional consensus overhead.

Remove the premature return to ensure cacheCompleteCB is called after all
log streams are processed, allowing the entire operation to be handled in
a single Raft proposal.
@ijsong ijsong force-pushed the reduce_redundant_raft_proposals_in_topic_unregister branch from 1a2a050 to 16093f0 Compare February 3, 2025 13:55
@ijsong ijsong merged commit 8ddf8f5 into kakao:main Feb 3, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants