Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base: decrease store liveness durations #137608

Merged
merged 2 commits into from
Dec 20, 2024

Conversation

miraradeva
Copy link
Contributor

Previously, store liveness used a heartbeat interval and support duration of 3s and 6s, respectively. This matched the lease extension and lease duration, respectively. However, these values were not well aligned with Raft's election timeout (4s) and jitter (up to 2s), so when a follower had to campaign after withdrawing support from the leader, the store liveness durations added up to the Raft timeout and jitter, instead of being subsumed by them.

This commit reduces the store liveness heartbeat interval and support duration to 1s and 3s, respectively.

Fixes: #133613

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miraradeva miraradeva marked this pull request as ready for review December 17, 2024 16:57
@miraradeva miraradeva requested review from a team as code owners December 17, 2024 16:57
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)


pkg/kv/kvserver/storeliveness/requester_state.go line 336 at r1 (raw file):

// updateMaxRequested forwards the current MaxRequested timestamp to now +
// interval, where now is the node's clock timestamp and support duration is the

nit: interval is the support duration or, better yet, "now + support duration, where now is the node's clock timestamp".

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 10 of 11 files at r1, 4 of 4 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @miraradeva)


pkg/base/testdata/raft_config_recovery line 3 at r4 (raw file):

# Any changes in this result should be copied to the comment on
# defaultRangeLeaseRaftElectionTimeoutMultiplier, and the corresponding
# reasoning should be adjusted.

We'll want to update the comment that this is referencing as well.

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nvanbenschoten)


pkg/base/testdata/raft_config_recovery line 3 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We'll want to update the comment that this is referencing as well.

Looks like defaultRangeLeaseRaftElectionTimeoutMultiplier doesn't exist as of 40cb075, so I updated the comment to point to the right place (which is defaultRangeLeaseDuration). I had already updated that one.

@miraradeva
Copy link
Contributor Author

The CI failure (TestRaftCheckQuorum) looks suspicious. I'll try to repro.

@miraradeva
Copy link
Contributor Author

The CI extended failure is TestLogGrowthWhenRefreshingPendingCommands, which should be fixed on master. Will rebase.

The liveness interval indicates the length of each period of requested
(or extended) support in store liveness. It is not really an interval
as much as a duration.

This commit renames "liveness interval" to "support duration", which
also aligns with the notion of a lease duration.

Part of: cockroachdb#133613

Release note: None
Previously, store liveness used a heartbeat interval and support
duration of 3s and 6s, respectively. This matched the lease extension
and lease duration, respectively. However, these values were not well
aligned with Raft's election timeout (4s) and jitter (up to 2s), so
when a follower had to campaign after withdrawing support from the
leader, the store liveness durations added up to the Raft timeout and
jitter, instead of being subsumed by them.

This commit reduces the store liveness heartbeat interval and support
duration to 1s and 3s, respectively.

Fixes: cockroachdb#133613

Release note: None
@miraradeva
Copy link
Contributor Author

CI is finally green. TFTRs!

bors r+

craig bot pushed a commit that referenced this pull request Dec 20, 2024
137608: base: decrease store liveness durations r=miraradeva a=miraradeva

Previously, store liveness used a heartbeat interval and support duration of 3s and 6s, respectively. This matched the lease extension and lease duration, respectively. However, these values were not well aligned with Raft's election timeout (4s) and jitter (up to 2s), so when a follower had to campaign after withdrawing support from the leader, the store liveness durations added up to the Raft timeout and jitter, instead of being subsumed by them.

This commit reduces the store liveness heartbeat interval and support duration to 1s and 3s, respectively.

Fixes: [#133613](#133613)

Release note: None

137845: diagnostics: honor stopper quiescence when reporting diagnostics r=xinhaoz a=dhartunian

Previously, we would block on the HTTP diagnostics request until the preset timeout regardless of stopper quiescence. This change ensures that the `ctx` passed to the diagnostics reporter and the HTTP client is cancelled if the stopper is quiescing.

Resolves: #136739

Release note: None

Co-authored-by: Mira Radeva <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
@craig
Copy link
Contributor

craig bot commented Dec 20, 2024

This PR was included in a batch that was canceled, it will be automatically retried

@craig craig bot merged commit d3d0426 into cockroachdb:master Dec 20, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storeliveness: tune heartbeat and liveness intervals
4 participants