Skip to content

Commit

Permalink
Merge #133860
Browse files Browse the repository at this point in the history
133860: kvflowcontrol: set kvadmission.flow_control.mode default to apply_to_all r=sumeerbhola a=kvoli

The cluster setting kvadmission.flow_control.mode is used to control whether replication admission control applies to only elastic work, or applies to all work which sets an admission priority.

The cluster setting was prior to #132125, metamorphic in tests, selecting either value with equal probability. #132125 disabled the metamorphism in order to prevent the newly introduced send queue / pull mode from being run unexpectedly, while still under testing.

With testing now in place, re-enable the metamorphism, leaving the default value unchanged.

Resolves: #132364

---

With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see #123509.

Resolves: #133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.

Co-authored-by: Austen McClernon <[email protected]>
  • Loading branch information
craig[bot] and kvoli committed Dec 19, 2024
2 parents 665bf12 + 71cdabc commit 6087ddf
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 3 deletions.
5 changes: 5 additions & 0 deletions docs/generated/metrics/metrics.html
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
<tr><td>STORAGE</td><td>admission.admitted.kv-stores.high-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.admitted.kv-stores.locking-normal-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.admitted.kv-stores.normal-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.admitted.kv-stores.user-high-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.admitted.kv.high-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.admitted.kv.locking-normal-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.admitted.kv.normal-pri</td><td>Number of requests admitted</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
Expand Down Expand Up @@ -54,6 +55,7 @@
<tr><td>STORAGE</td><td>admission.errored.kv-stores.high-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.errored.kv-stores.locking-normal-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.errored.kv-stores.normal-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.errored.kv-stores.user-high-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.errored.kv.high-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.errored.kv.locking-normal-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.errored.kv.normal-pri</td><td>Number of requests not admitted due to error</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
Expand Down Expand Up @@ -101,6 +103,7 @@
<tr><td>STORAGE</td><td>admission.requested.kv-stores.high-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.requested.kv-stores.locking-normal-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.requested.kv-stores.normal-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.requested.kv-stores.user-high-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.requested.kv.high-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.requested.kv.locking-normal-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>STORAGE</td><td>admission.requested.kv.normal-pri</td><td>Number of requests</td><td>Requests</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
Expand Down Expand Up @@ -128,6 +131,7 @@
<tr><td>STORAGE</td><td>admission.wait_durations.kv-stores.high-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_durations.kv-stores.locking-normal-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_durations.kv-stores.normal-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_durations.kv-stores.user-high-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_durations.kv.high-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_durations.kv.locking-normal-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_durations.kv.normal-pri</td><td>Wait time durations for requests that waited</td><td>Wait time Duration</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
Expand Down Expand Up @@ -155,6 +159,7 @@
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv-stores.high-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv-stores.locking-normal-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv-stores.normal-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv-stores.user-high-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv.high-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv.locking-normal-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>admission.wait_queue_length.kv.normal-pri</td><td>Length of wait queue</td><td>Requests</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
Expand Down
4 changes: 3 additions & 1 deletion pkg/kv/kvserver/client_raft_log_queue_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,10 @@ func TestRaftTracing(t *testing.T) {
expectedMessages := []string{
`replica_proposal_buf.* flushing proposal to Raft`,
`replica_proposal_buf.* registering local trace`,
// Note that we don't assert that the 1->3 MsgApp goes through, as
// the ordering may change between 1->2 and 1->3. It should be
// sufficient to just check one of them for tracing.
`replica_raft.* 1->2 MsgApp`,
`replica_raft.* 1->3 MsgApp`,
`replica_raft.* AppendThread->1 MsgStorageAppendResp`,
`ack-ing replication success to the client`,
}
Expand Down
1 change: 1 addition & 0 deletions pkg/kv/kvserver/kvflowcontrol/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ go_library(
"//pkg/util/admission/admissionpb",
"//pkg/util/ctxutil",
"//pkg/util/humanizeutil",
"//pkg/util/metamorphic",
"@com_github_cockroachdb_redact//:redact",
],
)
7 changes: 6 additions & 1 deletion pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/admission/admissionpb"
"github.com/cockroachdb/cockroach/pkg/util/ctxutil"
"github.com/cockroachdb/cockroach/pkg/util/humanizeutil"
"github.com/cockroachdb/cockroach/pkg/util/metamorphic"
"github.com/cockroachdb/redact"
)

Expand All @@ -41,7 +42,11 @@ var Mode = settings.RegisterEnumSetting(
settings.SystemOnly,
"kvadmission.flow_control.mode",
"determines the 'mode' of flow control we use for replication traffic in KV, if enabled",
modeDict[ApplyToElastic], /* default value */
metamorphic.ConstantWithTestChoice(
"kvadmission.flow_control.mode",
modeDict[ApplyToAll], /* default value */
modeDict[ApplyToElastic], /* other value */
),
modeDict,
)

Expand Down
4 changes: 3 additions & 1 deletion pkg/testutils/testcluster/testcluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -1479,7 +1479,9 @@ func (tc *TestCluster) WaitForFullReplication() error {
// Force upreplication. Otherwise, if we rely on the scanner to do it,
// it'll take a while.
if err := s.ForceReplicationScanAndProcess(); err != nil {
return err
log.Infof(context.TODO(), "%v", err)
notReplicated = true
return nil
}
if err := s.ComputeMetrics(context.TODO()); err != nil {
// This can sometimes fail since ComputeMetrics calls
Expand Down

0 comments on commit 6087ddf

Please sign in to comment.