You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
disrupt_decommission_streaming_err masks some raft errors through ignore_raft_topology_cmd_failing, for the purpose of invoking start_and_interrupt_decommission_streaming.
In the test run noted above, start_and_interrupt_decommission_streaming rebooted DB Node 1. This caused DB Node 7, which was at the time in a drain RPC with Node 1, to log:
raft_topology - tablets draining failed with std::runtime_error (raft topology: exec_global_command(barrier) failed with seastar::rpc::closed_error (connection is closed)). Aborting the topology operation
Because ignore_raft_topology_cmd_failing did not mask this particular raft error, SCT considered the whole test a failure.
I think ignore_raft_topology_cmd_failing should downgrade the above-noted error, too, to a warning.
Notes:
ignore_raft_topology_cmd_failing already masks a related error (drain rpc failed, proceed to fence old writes ... connection is closed); however, the specific error above doesn't seem to be masked.
Independently, said drain rpc failed, proceed to fence old writes ... connection is closed seems to be masked twice (reduntantly). Commit 03eb8b0 and commit 8b9a75f added the following regexes, respectively:
.*raft_topology - drain rpc failed, proceed to fence old writes:.*connection is closed
.*raft_topology - drain rpc failed, proceed to fence old writes.*connection is closed
Note the single character difference: the first pattern contains a colon (:), which is useless, because the second pattern matches a superset of what the first pattern matches.
Arguably, the first pattern should be cleaned up, in a followup patch to the new log pattern addition (tablets draining failed...).
The text was updated successfully, but these errors were encountered:
Report against SCT as of commit 08636e9.
Context: longevity-large-partition-200k-pks-4days-gce-test/6 | argus.
disrupt_decommission_streaming_err
masks some raft errors throughignore_raft_topology_cmd_failing
, for the purpose of invokingstart_and_interrupt_decommission_streaming
.In the test run noted above,
start_and_interrupt_decommission_streaming
rebooted DB Node 1. This caused DB Node 7, which was at the time in a drain RPC with Node 1, to log:(See the more complete log snippet here.)
Because
ignore_raft_topology_cmd_failing
did not mask this particular raft error, SCT considered the whole test a failure.I think
ignore_raft_topology_cmd_failing
should downgrade the above-noted error, too, to a warning.Notes:
ignore_raft_topology_cmd_failing
already masks a related error (drain rpc failed, proceed to fence old writes ... connection is closed
); however, the specific error above doesn't seem to be masked.Independently, said
drain rpc failed, proceed to fence old writes ... connection is closed
seems to be masked twice (reduntantly). Commit 03eb8b0 and commit 8b9a75f added the following regexes, respectively:Note the single character difference: the first pattern contains a colon (
:
), which is useless, because the second pattern matches a superset of what the first pattern matches.Arguably, the first pattern should be cleaned up, in a followup patch to the new log pattern addition (
tablets draining failed...
).The text was updated successfully, but these errors were encountered: