kv/kvserver: ambiguous errors returned from a cluster with only 1 unhealthy node and no leases #137717

andrewbaptist · 2024-12-18T22:03:09Z

Describe the problem

During a test run where a single node was overloaded and had no leases, a request failed with an ambiguous error. This should have been retried internally and handled by dist_sender.

To Reproduce

I ran the backfill perturbation test with the following command on a 30 node cluster using expiration leases.

COCKROACH_RANDOM_SEED=-7744531226375341113 roachtest run perturbation/metamorphic/backfill --cpu-quota 1000

However it didn't even get to the backfill, so this should be able to be recreated on a 30 node cluster with the following:

ALTER DATABASE kv CONFIGURE ZONE USING constraints='{+node30:1}', lease_preferences='[[-node30]]', num_replicas=3
roachprod run $CLUSTER:1 "./cockroach workload init kv  --splits=10000
roachprod run $CLUSTER:31-32 "./cockroach workload run kv --max-block-bytes=10000 --min-block-bytes=10000 --concurrency=100  {pgurl:1-29}

Expected behavior
The kv run failed with this error:

        E241218 20:41:10.295589 1 workload/cli/run.go:599  [-] 4  ERROR: result is ambiguous: error=replica unavailable: (n30,s59):1 unable to serve request to r345:/Table/106/1/{559324921606706082-698531396645887576} [(n30,s59):1, (n7,s14):2, (n12,s24):4, next=6, gen=83]: closed timestamp: 1734554233.229828522,0 (2024-12-18 20:37:13); raft status: {"id":"1","term":10,"vote":"4","commit":30318,"lead":"4","leadEpoch":"0","raftState":"StateFollower","applied":30318,"progress":{},"leadtransferee":"0"}: result is ambiguous: unable to determine whether command was applied via snapshot [exhausted] (last error: failed to send RPC: leaseholder not found in transport; last error: [NotLeaseHolderError] lease held by different store; r345: replica (n12,s23):7 not lease holder; current lease is repl=(n2,s4):8VOTER_INCOMING seq=18 start=1734554469.655156737,0 exp=1734554475.655121007,0 pro=1734554469.655121007,0 acq=Transfer) (SQLSTATE 40003)

Note at this time, there were no leases on n30 both due to the constraints and because it had entered IO overload about 10 minutes earlier.

From a range log perspective on this range, there are the following relevant logs:

W241218 20:34:27.392740 361 kv/kvserver/replica_raft.go:1677 ⋮ [T1,Vsystem,n30,s59,r345/1:‹/Table/106/1/{559324…-838488…}›,raft] 2607  have been waiting 60.50s for slow proposal RequestLease [/Table/106/1/‹559324921606706082›]
E241218 20:34:27.393228 147841 kv/kvserver/replica_circuit_breaker.go:163 ⋮ [T1,Vsystem,n30,s59,r345/1:‹/Table/106/1/{559324…-838488…}›] 2608  breaker: tripped with error: replica unavailable: (n30,s59):1 unable to serve request to r345:‹/Table/106/1/{559324921606706082-838488366986797792}› [(n30,s59):1, (n7,s14):2, (n11,s22):3, (n12,s24):4LEARNER, next=5, gen=77]: closed timestamp: 1734553973.417003733,0 (2024-12-18 20:32:53); raft status: {"id":"1","term":8,"vote":"4","commit":12964,"lead":"4","leadEpoch":"0","raftState":"StateFollower","applied":12964,"progress":{},"leadtransferee":"0"}: have been waiting 60.50s for slow proposal RequestLease [/Table/106/1/‹559324921606706082›]

So it entered a tripped state at 20:34 and never leaves it.

On n7:

I241218 20:33:27.386753 629 kv/kvserver/kvserverbase/forced_error.go:87 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-838488…}›,raft] 370  rejected lease request repl=(n30,s59):1 seq=14 start=1734553980.403499013,1 exp=1734554003.223723120,0 pro=1734553997.223723120,0 acq=Request; current lease repl=(n12,s24):4 seq=14 start=1734553997.142837014,0 exp=1734554012.682446861,0 pro=1734554006.682446861,0 acq=Request; err: cannot replace lease repl=(n12,s24):4 seq=14 start=1734553997.142837014,0 exp=1734554012.682446861,0 pro=1734554006.682446861,0 acq=Request with repl=(n30,s59):1 seq=14 start=1734553980.403499013,1 exp=1734554003.223723120,0 pro=1734553997.223723120,0 acq=Request: ‹proposed under invalid lease›
...
I241218 20:34:54.394915 632 kv/kvserver/kvserverbase/forced_error.go:87 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-838488…}›,raft] 464  rejected lease request repl=(n30,s59):1 seq=14 start=1734553980.403499013,1 exp=1734554003.223723120,0 pro=1734553997.223723120,0 acq=Request; current lease repl=(n12,s24):4 seq=14 start=1734553997.142837014,0 exp=1734554098.182451704,0 pro=1734554092.182451704,0 acq=Request; err: cannot replace lease repl=(n12,s24):4 seq=14 start=1734553997.142837014,0 exp=1734554098.182451704,0 pro=1734554092.182451704,0 acq=Request with repl=(n30,s59):1 seq=14 start=1734553980.403499013,1 exp=1734554003.223723120,0 pro=1734553997.223723120,0 acq=Request: ‹proposed under invalid lease›
W241218 20:39:47.427705 227774 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n7,raftsnapshot,s14,r345/2:‹/Table/106/1/{559324…-698531…}›] 642  attempt 1: delegate snapshot ‹range_id:345 coordinator_replica:<node_id:7 store_id:14 replica_id:2 type:VOTER_FULL > recipient_replica:<node_id:30 store_id:59 replica_id:1 type:VOTER_FULL > delegated_sender:<node_id:7 store_id:14 replica_id:2 type:VOTER_FULL > term:11 first_index:44192 sender_queue_name:RAFT_SNAPSHOT_QUEUE descriptor_generation:85 queue_on_delegate_len:-1 snap_id:4b0e7a00-fc58-4e03-8da0-1cbbd458106a › request failed error sending couldn't accept ‹range_id:345 coordinator_replica:<node_id:7 store_id:14 replica_id:2 type:VOTER_FULL > recipient_replica:<node_id:30 store_id:59 replica_id:1 type:VOTER_FULL > delegated_sender:<node_id:7 store_id:14 replica_id:2 type:VOTER_FULL > term:11 first_index:44192 sender_queue_name:RAFT_SNAPSHOT_QUEUE descriptor_generation:85 queue_on_delegate_len:-1 snap_id:4b0e7a00-fc58-4e03-8da0-1cbbd458106a ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]

On n12:

I241218 20:37:57.498705 128037 kv/kvserver/replica_command.go:1923 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 583  could not successfully add and upreplicate LEARNER replica(s) on [n30,s60], rolling back: error sending couldn't accept ‹range_id:345 coordinator_replica:<node_id:12 store_id:24 replica_id:4 type:VOTER_FULL > recipient_
replica:<node_id:30 store_id:60 replica_id:6 type:LEARNER > delegated_sender:<node_id:12 store_id:24 replica_id:4 type:VOTER_FULL > term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:-1 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
I241218 20:37:57.520319 128037 kv/kvserver/replica_command.go:2200 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 585  rolled back LEARNER n30,s60 in r345:‹/Table/106/1/{559324921606706082-698531396645887576}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, (n30,s60):6LEARNER, next=7, gen=84]
I241218 20:38:36.726851 164477 kv/kvserver/raft_log_queue.go:705 ⋮ [T1,Vsystem,n12,raftlog,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 610  should truncate: true [truncate 1662 entries to first index 40812 (chosen via: last index); log too large (16 MiB > 16 MiB); implies 1 Raft snapshot]
I241218 20:40:39.702630 213573 kv/kvserver/replica_raftstorage.go:527 ⋮ [T1,Vsystem,n12,s23,r345/7:‹/Table/106/1/{559324…-629174…}›] 684  applied snapshot 7b06b6a3 from (n7,s14):2 at applied index 54046 (total=22ms data=256 MiB excise=true ingestion=7@3ms)
I241218 20:40:40.312778 647 kv/kvserver/store_remove_replica.go:152 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-629174…}›,raft] 685  removing replica r345/4

Full kv-distribution log across all nodes for r345

Details

241218 20:32:09.618867 81700 13@kv/kvserver/replica_command.go:481 ⋮ [T1,Vsystem,n7,split,s14,r75/12:‹/Table/106/1/{279496…-838488…}›] 263  initiating a split of this range at key /Table/106/1/‹559324921606706082› [r345] (512 MiB above threshold size 512 MiB)
241218 20:32:15.653856 85194 13@kv/kvserver/lease_queue.go:138 ⋮ [T1,Vsystem,n7,lease,s14,r345/2:‹/Table/106/1/{559324…-838488…}›] 269  transferring lease to s22 usage=[batches/s=114.6 request_cpu/s=15ms raft_cpu/s=15ms write(keys)/s=113.6 write(bytes)/s=1.1 MiB read(keys)/s=367.0 read(bytes)/s=3.5 MiB]
241218 20:32:50.737625 649 13@kv/kvserver/store_rebalancer.go:713 ⋮ [T1,Vsystem,n11,s22,store-rebalancer,obj=‹cpu›] 332  rebalancing r345 load=(queries-per-second=227.1 cpu-per-second=64ms) to better balance load: voters from (n30,s59):1,(n7,s14):2,(n11,s22):3 to [n12,s24 n7,s14 n30,s59]; non-voters from  to []
241218 20:32:50.738349 649 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›,*kvpb.AdminChangeReplicasRequest] 333  change replicas (add [(n12,s24):4LEARNER] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-838488366986797792}› [(n30,s59):1, (n7,s14):2, (n11,s22):3, next=4, gen=76]
241218 20:32:50.742238 649 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›] 334  proposing SIMPLE(l4) [(n12,s24):4LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n11,s22):3 (n12,s24):4LEARNER] next=5
241218 20:33:17.007165 108208 13@kv/kvserver/store_snapshot.go:2144 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›] 348  streamed snapshot a40a21a3 at applied index 10758 to (n12,s24):4LEARNER with 358 MiB in 10.82s @ 33 MiB/s: kvs=37412 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 15.44s
241218 20:33:17.008982 61472 13@kv/kvserver/replica_raftstorage.go:505 ⋮ [T1,Vsystem,n12,s24,r345/4:{-}] 342  applying snapshot a40a21a3 from (n11,s22):3 at applied index 10758
241218 20:33:17.012363 66827 13@kv/kvserver/allocator/allocatorimpl/allocator.go:2212 ⋮ [T1,Vsystem,n12,lease,s24,r345/4:‹/Table/106/1/{559324…-838488…}›] 343  expected leaseholder store to be in the slice of existing replicas
241218 20:33:17.022197 649 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›,*kvpb.AdminChangeReplicasRequest] 349  change replicas (add [(n12,s24):4VOTER_INCOMING] remove [(n11,s22):3VOTER_DEMOTING_LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-838488366986797792}› [(n30,s59):1, (n7,s14):2, (n11,s22):3, (n12,s24):4LEARNER, next=5, gen=77]
241218 20:33:17.026573 649 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›] 350  proposing ENTER_JOINT(r3 l3 v4) [(n12,s24):4VOTER_INCOMING], [(n11,s22):3VOTER_DEMOTING_LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n11,s22):3VOTER_DEMOTING_LEARNER (n12,s24):4VOTER_INCOMING] next=5
241218 20:33:17.153141 649 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›,*kvpb.AdminChangeReplicasRequest] 351  change replicas (add [] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-838488366986797792}› [(n30,s59):1, (n7,s14):2, (n11,s22):3VOTER_DEMOTING_LEARNER, (n12,s24):4VOTER_INCOMING, next=5, gen=78]
241218 20:33:17.167314 60177 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-838488…}›] 344  proposing LEAVE_JOINT: after=[(n30,s59):1 (n7,s14):2 (n11,s22):3LEARNER (n12,s24):4] next=5
241218 20:33:17.168758 649 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n11,s22,r345/3:‹/Table/106/1/{559324…-838488…}›,*kvpb.AdminChangeReplicasRequest] 352  change replicas (add [] remove [(n11,s22):3LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-838488366986797792}› [(n30,s59):1, (n7,s14):2, (n11,s22):3LEARNER, (n12,s24):4, next=5, gen=79]
241218 20:33:17.175590 60177 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-838488…}›] 345  proposing SIMPLE(r3) [(n11,s22):3LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4] next=5
241218 20:34:56.657324 75161 13@kv/kvserver/store_snapshot.go:2144 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-838488…}›] 397  streamed snapshot 74ea65c5 at applied index 13874 to (n30,s59):1 with 387 MiB in 14.14s @ 27 MiB/s: kvs=40460 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 32.14s
241218 20:34:56.664864 144208 13@kv/kvserver/replica_raftstorage.go:505 ⋮ [T1,Vsystem,n30,s59,r345/1:‹/Table/106/1/{559324…-838488…}›] 425  applying snapshot 74ea65c5 from (n12,s24):4 at applied index 13874
241218 20:35:16.626064 705 13@kv/kvserver/store_rebalancer.go:281 ⋮ [T1,Vsystem,n12,s24,store-rebalancer,obj=‹cpu›] 402 +       2: r345:‹/Table/106/1/{559324921606706082-838488366986797792}› replicas=[(n30,s59):1,(n7,s14):2,(n12,s24):4] load=[batches/s=114.4 request_cpu/s=19ms raft_cpu/s=16ms write(keys)/s=113.3 write(bytes)/s=1.1 MiB read(keys)/s=1086.4 read(bytes)/s=10 MiB]
241218 20:35:16.864565 100043 13@kv/kvserver/replica_command.go:481 ⋮ [T1,Vsystem,n12,split,s24,r345/4:‹/Table/106/1/{559324…-838488…}›] 404  initiating a split of this range at key /Table/106/1/‹698531396645887576› [r496] (512 MiB above threshold size 512 MiB)
241218 20:35:32.436868 106941 13@kv/kvserver/lease_queue.go:138 ⋮ [T1,Vsystem,n12,lease,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 427  transferring lease to s14 usage=[batches/s=63.7 request_cpu/s=10ms raft_cpu/s=9ms write(keys)/s=63.1 write(bytes)/s=616 KiB read(keys)/s=466.0 read(bytes)/s=4.5 MiB]
241218 20:36:06.743137 158110 13@kv/kvserver/lease_queue.go:138 ⋮ [T1,Vsystem,n7,lease,s14,r345/2:‹/Table/106/1/{559324…-698531…}›] 395  transferring lease to s24 usage=[batches/s=12.2 request_cpu/s=2ms raft_cpu/s=2ms write(keys)/s=12.2 write(bytes)/s=117 KiB read(keys)/s=0.0 read(bytes)/s=0 B]
241218 20:36:40.472616 118306 13@kv/kvserver/allocator/plan/replicate.go:841 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 501  rebalancing voter n30,s59 to n30,s60: [1:0, 2:29134, 4*:29134]
241218 20:36:40.473687 118306 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 502  change replicas (add [(n30,s60):5LEARNER] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-698531396645887576}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, next=5, gen=81]
241218 20:36:40.475322 118306 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 503  proposing SIMPLE(l5) [(n30,s60):5LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4 (n30,s60):5LEARNER] next=6
241218 20:36:40.476929 118306 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 504  attempt 1: delegate snapshot ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29136 sender_queue_name:REPLICATE_QUEUE descriptor_generation:82 queue_on_delegate_len:3 snap_id:8e1654e4-75ad-4ff6-aee4-abcc8b24dbca › request failed error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29136 sender_queue_name:REPLICATE_QUEUE descriptor_generation:82 queue_on_delegate_len:3 snap_id:8e1654e4-75ad-4ff6-aee4-abcc8b24dbca ›: [n30,s59,r345/1:‹/Table/106/1/{559324…-838488…}›]: generation has changed since snapshot was generated 80 < 82
241218 20:36:43.774681 114177 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n12,raftsnapshot,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 506  attempt 1: delegate snapshot ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29116 sender_queue_name:RAFT_SNAPSHOT_QUEUE descriptor_generation:81 queue_on_delegate_len:-1 snap_id:08761d1b-ae9d-450f-b74e-5a1c3051c4b0 › request failed error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29116 sender_queue_name:RAFT_SNAPSHOT_QUEUE descriptor_generation:81 queue_on_delegate_len:-1 snap_id:08761d1b-ae9d-450f-b74e-5a1c3051c4b0 ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:14.454488 118306 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 519  attempt 2: delegate snapshot ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29136 sender_queue_name:REPLICATE_QUEUE descriptor_generation:82 queue_on_delegate_len:-1 snap_id:8e1654e4-75ad-4ff6-aee4-abcc8b24dbca › request failed error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29136 sender_queue_name:REPLICATE_QUEUE descriptor_generation:82 queue_on_delegate_len:-1 snap_id:8e1654e4-75ad-4ff6-aee4-abcc8b24dbca ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:14.454994 118306 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 520  change replicas (add [] remove [(n30,s60):5LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-698531396645887576}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, (n30,s60):5LEARNER, next=6, gen=82]
241218 20:37:14.460237 118306 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 521  proposing SIMPLE(r5) [(n30,s60):5LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4] next=6
241218 20:37:14.461177 118306 13@kv/kvserver/replicate_queue.go:784 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 522  error processing replica: error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29136 sender_queue_name:REPLICATE_QUEUE descriptor_generation:82 queue_on_delegate_len:-1 snap_id:8e1654e4-75ad-4ff6-aee4-abcc8b24dbca ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:14.461227 118306 13@kv/kvserver/replicate_queue.go:677 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 523  error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:29136 sender_queue_name:REPLICATE_QUEUE descriptor_generation:82 queue_on_delegate_len:-1 snap_id:8e1654e4-75ad-4ff6-aee4-abcc8b24dbca ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:21.495166 128037 13@kv/kvserver/allocator/plan/replicate.go:841 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 525  rebalancing voter n30,s59 to n30,s60: [1:0, 2:30579, 4*:30579]
241218 20:37:21.495465 128037 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 526  change replicas (add [(n30,s60):6LEARNER] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-698531396645887576}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, next=6, gen=83]
241218 20:37:21.497619 128037 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 527  proposing SIMPLE(l6) [(n30,s60):6LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4 (n30,s60):6LEARNER] next=7
241218 20:37:21.499301 128037 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 528  attempt 1: delegate snapshot ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:3 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 › request failed error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:3 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 ›: [n30,s59,r345/1:‹/Table/106/1/{559324…-838488…}›]: generation has changed since snapshot was generated 80 < 84
241218 20:37:57.498592 128037 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 543  attempt 2: delegate snapshot ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:-1 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 › request failed error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:-1 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:57.499265 128037 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 544  change replicas (add [] remove [(n30,s60):6LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-698531396645887576}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, (n30,s60):6LEARNER, next=7, gen=84]
241218 20:37:57.513025 128037 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 545  proposing SIMPLE(r6) [(n30,s60):6LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4] next=7
241218 20:37:57.520372 128037 13@kv/kvserver/replicate_queue.go:784 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 546  error processing replica: error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:-1 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:57.520507 128037 13@kv/kvserver/replicate_queue.go:677 ⋮ [T1,Vsystem,n12,replicate,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 547  error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:10 first_index:30581 sender_queue_name:REPLICATE_QUEUE descriptor_generation:84 queue_on_delegate_len:-1 snap_id:ceb68b06-c2a3-47ea-b957-2542c0cd6d48 ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:37:58.312560 126416 13@kv/kvserver/store_snapshot.go:2144 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 548  streamed snapshot c5c115cd at applied index 30318 to (n30,s59):1 with 287 MiB in 11.57s @ 25 MiB/s: kvs=29992 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 30.51s
241218 20:37:58.316594 217017 13@kv/kvserver/replica_raftstorage.go:505 ⋮ [T1,Vsystem,n30,s59,r345/1:‹/Table/106/1/{559324…-838488…}›] 489  applying snapshot c5c115cd from (n12,s24):4 at applied index 30318
241218 20:38:39.514661 166739 13@kv/kvserver/lease_queue.go:138 ⋮ [T1,Vsystem,n12,lease,s24,r345/4:‹/Table/106/1/{559324…-698531…}›] 563  transferring lease to s14 usage=[batches/s=80.8 request_cpu/s=11ms raft_cpu/s=12ms write(keys)/s=80.2 write(bytes)/s=783 KiB read(keys)/s=0.0 read(bytes)/s=1 B]
241218 20:39:47.427705 227774 13@kv/kvserver/replica_command.go:3080 ⋮ [T1,Vsystem,n7,raftsnapshot,s14,r345/2:‹/Table/106/1/{559324…-698531…}›] 500  attempt 1: delegate snapshot ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:11 first_index:44192 sender_queue_name:RAFT_SNAPSHOT_QUEUE descriptor_generation:85 queue_on_delegate_len:-1 snap_id:4b0e7a00-fc58-4e03-8da0-1cbbd458106a › request failed error sending couldn't accept ‹range_id:345 coordinator_replica: recipient_replica: delegated_sender: term:11 first_index:44192 sender_queue_name:RAFT_SNAPSHOT_QUEUE descriptor_generation:85 queue_on_delegate_len:-1 snap_id:4b0e7a00-fc58-4e03-8da0-1cbbd458106a ›: grpc: ‹giving up during snapshot reservation due to cluster setting "kv.snapshot_receiver.reservation_queue_timeout_fraction": context deadline exceeded› [code 4/DeadlineExceeded]
241218 20:40:10.008873 268847 13@kv/kvserver/replica_command.go:481 ⋮ [T1,Vsystem,n7,split,s14,r345/2:‹/Table/106/1/{559324…-698531…}›] 511  initiating a split of this range at key /Table/106/1/‹629174745943187222› [r347] (512 MiB above threshold size 512 MiB); r345/1 is waiting for a Raft snapshot
241218 20:40:10.248435 269124 13@kv/kvserver/allocator/plan/replicate.go:841 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 512  rebalancing voter n12,s24 to n12,s23: [1:0, 2*:54044, 4:54044]
241218 20:40:10.248642 269124 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 513  change replicas (add [(n12,s23):7LEARNER] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, next=7, gen=86]
241218 20:40:10.252566 269124 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 514  proposing SIMPLE(l7) [(n12,s23):7LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4 (n12,s23):7LEARNER] next=8
241218 20:40:39.676848 213441 13@kv/kvserver/store_snapshot.go:2144 ⋮ [T1,Vsystem,n12,s24,r345/4:‹/Table/106/1/{559324…-629174…}›] 633  streamed snapshot 7b06b6a3 at applied index 54046 to (n12,s23):7LEARNER with 256 MiB in 7.74s @ 33 MiB/s: kvs=26780 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 21.68s
241218 20:40:39.680437 213573 13@kv/kvserver/replica_raftstorage.go:505 ⋮ [T1,Vsystem,n12,s23,r345/7:{-}] 634  applying snapshot 7b06b6a3 from (n7,s14):2 at applied index 54046
241218 20:40:39.702726 233192 13@kv/kvserver/allocator/allocatorimpl/allocator.go:2212 ⋮ [T1,Vsystem,n12,lease,s23,r345/7:‹/Table/106/1/{559324…-629174…}›] 635  expected leaseholder store to be in the slice of existing replicas
241218 20:40:39.712637 269124 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 542  change replicas (add [(n12,s23):7VOTER_INCOMING] remove [(n12,s24):4VOTER_DEMOTING_LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2, (n12,s24):4, (n12,s23):7LEARNER, next=8, gen=87]
241218 20:40:39.728268 269124 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 543  proposing ENTER_JOINT(r4 l4 v7) [(n12,s23):7VOTER_INCOMING], [(n12,s24):4VOTER_DEMOTING_LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4VOTER_DEMOTING_LEARNER (n12,s23):7VOTER_INCOMING] next=8
241218 20:40:39.731778 269124 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 544  change replicas (add [] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2, (n12,s24):4VOTER_DEMOTING_LEARNER, (n12,s23):7VOTER_INCOMING, next=8, gen=88]
241218 20:40:40.297619 269124 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 545  proposing LEAVE_JOINT: after=[(n30,s59):1 (n7,s14):2 (n12,s24):4LEARNER (n12,s23):7] next=8
241218 20:40:40.300500 269124 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 546  change replicas (add [] remove [(n12,s24):4LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2, (n12,s24):4LEARNER, (n12,s23):7, next=8, gen=89]
241218 20:40:40.307508 269124 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 547  proposing SIMPLE(r4) [(n12,s24):4LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s23):7] next=8
241218 20:40:52.811653 300727 13@kv/kvserver/allocator/plan/replicate.go:841 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 567  rebalancing voter n7,s14 to n2,s4: [1:0, 2*:57893, 7:57893]
241218 20:40:52.811887 300727 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 568  change replicas (add [(n2,s4):8LEARNER] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2, (n12,s23):7, next=8, gen=90]
241218 20:40:52.814120 300727 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 569  proposing SIMPLE(l8) [(n2,s4):8LEARNER]: after=[(n30,s59):1 (n7,s14):2 (n12,s23):7 (n2,s4):8LEARNER] next=9
241218 20:41:09.612869 300778 13@kv/kvserver/store_snapshot.go:2144 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 601  streamed snapshot 7b4a61b2 at applied index 57896 to (n2,s4):8LEARNER with 292 MiB in 8.84s @ 33 MiB/s: kvs=30595 rangeKVs=0 sharedSSTs=0, externalSSTs=0, rate-limit: 32 MiB/s, queued: 7.95s
241218 20:41:09.614793 306488 13@kv/kvserver/replica_raftstorage.go:505 ⋮ [T1,Vsystem,n2,s4,r345/8:{-}] 722  applying snapshot 7b4a61b2 from (n7,s14):2 at applied index 57896
241218 20:41:09.641242 319464 13@kv/kvserver/allocator/allocatorimpl/allocator.go:2212 ⋮ [T1,Vsystem,n2,lease,s4,r345/8:‹/Table/106/1/{559324…-629174…}›] 723  expected leaseholder store to be in the slice of existing replicas
241218 20:41:09.651447 300727 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 602  change replicas (add [(n2,s4):8VOTER_INCOMING] remove [(n7,s14):2VOTER_DEMOTING_LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2, (n12,s23):7, (n2,s4):8LEARNER, next=9, gen=91]
241218 20:41:09.653575 300727 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n7,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 603  proposing ENTER_JOINT(r2 l2 v8) [(n2,s4):8VOTER_INCOMING], [(n7,s14):2VOTER_DEMOTING_LEARNER]: after=[(n30,s59):1 (n7,s14):2VOTER_DEMOTING_LEARNER (n12,s23):7 (n2,s4):8VOTER_INCOMING] next=9
241218 20:41:10.019476 300727 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 604  change replicas (add [] remove []): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2VOTER_DEMOTING_LEARNER, (n12,s23):7, (n2,s4):8VOTER_INCOMING, next=9, gen=92]
241218 20:41:10.123991 319689 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n2,s4,r345/8:‹/Table/106/1/{559324…-629174…}›] 724  proposing LEAVE_JOINT: after=[(n30,s59):1 (n7,s14):2LEARNER (n12,s23):7 (n2,s4):8] next=9
241218 20:41:10.128251 300727 13@kv/kvserver/replica_command.go:2535 ⋮ [T1,Vsystem,n7,replicate,s14,r345/2:‹/Table/106/1/{559324…-629174…}›] 605  change replicas (add [] remove [(n7,s14):2LEARNER]): existing descriptor r345:‹/Table/106/1/{559324921606706082-629174745943187222}› [(n30,s59):1, (n7,s14):2LEARNER, (n12,s23):7, (n2,s4):8, next=9, gen=93]
241218 20:41:10.165317 319689 13@kv/kvserver/replica_raft.go:379 ⋮ [T1,Vsystem,n2,s4,r345/8:‹/Table/106/1/{559324…-629174…}›] 725  proposing SIMPLE(r2) [(n7,s14):2LEARNER]: after=[(n30,s59):1 (n2,s4):8 (n12,s23):7] next=9

If applicable, add screenshots to help explain your problem.

Environment:

CockroachDB version: master

Jira issue: CRDB-45717

The text was updated successfully, but these errors were encountered:

andrewbaptist · 2025-01-09T19:13:16Z

Why do we expect the ambiguous error to be retried internally by the DistSender?

Generally the contract of DistSender.Send is to retry failures and only return ambiguous results in one specific case when we sent a modifying type of request to a node and have not received a response from it and we can not retry on other stores without knowing the outcome of that request.

There are two cases when we decide what we won't get a response back:

The connection was broken
The calling context was cancelled.

In all other cases we should (and mostly do) return a different, non-ambiguous error.

We recently added a case where the connection is still valid but the circuit breaker is tripped on a replica. However this was intended to handle cases where the leaseholder was stuck, not a non-leaseholder replica.

In these two cases an ambiguous response is the correct response, but in any other case we don't need to return this response.

In this specific case, we had a healthy range on two of the three replicas and the leaseholder was on one of the healthy nodes.

The single replica case is not handled any differently, and typically we still only return an ambiguous error for the two cases described above.

The error that happens with this request is specific to the structure of dist sender and when it retries. For the given request, there is a valid leaseholder and the request would have succeeded if either the client had the correct cache or even no cache at all. It failed because of a specific type of stale cache.

arulajmani · 2025-01-09T19:16:47Z

Sorry, nevermind my comment -- I misunderstood the number of nodes (and therefore the number of replicas).

arulajmani · 2025-01-09T19:19:28Z

For the given request, there is a valid leaseholder and the request would have succeeded if either the client had the correct cache or even no cache at all. It failed because of a specific type of stale cache.

That checks out from the error snippet above. However, I can't think of a way around this -- is there a proposed solution?

andrewbaptist · 2025-01-09T19:33:28Z

The solution would be to handle the retry logic in the case of circuit breakers differently.

The "hole" has to do with when we exit the sendToReplicas and sendPartialBatch methods. (ignore the context cancellation as its not relevant here).

sendToReplicas - exits if either there is a definite ambiguous error or all replicas have been tried at least once
sendPartialBatch - exits if there is a definite ambiguous error

sendPartialBatch needs to differentiate between the four types of errors that can come out:

Unretriable ambiguous
Non-ambiguous but "final" error from a node.
Retriable and the descriptor changed.
Retriable but the descriptor has not changed.

The forth case is the one that is handled incorrectly. The signature of sendToReplicas and the handling of EvictionTokens is messy right now which makes this fix hard. This is an internal structure problem and not anything fundamental.

The easiest short term solution would be to look at the eviction token before and after calling sendToReplicas and if it has changed and we have a replicaUnavailableError then we should retry. This should handle most of these types of errors.

The better fix would be to return a struct from sendToReplicas that is similar to the parameters to selectBestError and allow the higher layer to make the retry decision.

andrewbaptist added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. branch-master Failures and bugs on the master branch. T-kv KV Team O-perturbation Bugs found by the perturbation framework labels Dec 18, 2024

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/kvserver: ambiguous errors returned from a cluster with only 1 unhealthy node and no leases #137717

kv/kvserver: ambiguous errors returned from a cluster with only 1 unhealthy node and no leases #137717

andrewbaptist commented Dec 18, 2024 •

edited by cockroach-jira-scripts

Loading

This comment was marked as outdated.

andrewbaptist commented Jan 9, 2025

arulajmani commented Jan 9, 2025

arulajmani commented Jan 9, 2025

andrewbaptist commented Jan 9, 2025

kv/kvserver: ambiguous errors returned from a cluster with only 1 unhealthy node and no leases #137717

kv/kvserver: ambiguous errors returned from a cluster with only 1 unhealthy node and no leases #137717

Comments

andrewbaptist commented Dec 18, 2024 • edited by cockroach-jira-scripts Loading

This comment was marked as outdated.

andrewbaptist commented Jan 9, 2025

arulajmani commented Jan 9, 2025

arulajmani commented Jan 9, 2025

andrewbaptist commented Jan 9, 2025

andrewbaptist commented Dec 18, 2024 •

edited by cockroach-jira-scripts

Loading