Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: update mt-upgrade test owner to db-server #137067

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

rimadeodhar
Copy link
Collaborator

@rimadeodhar rimadeodhar commented Dec 9, 2024

This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to t-db-server for triage.

Epic: none
Release note: none

This PR updates the test ownership for the
multitenant-upgrade test to the DB Server
team. All future test failures will be routed
to `t-db-server` for triage.

Epic: none
Release note: none
@rimadeodhar rimadeodhar requested a review from a team as a code owner December 9, 2024 23:41
@rimadeodhar rimadeodhar requested review from shailendra-patel and csgourav and removed request for a team December 9, 2024 23:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rimadeodhar rimadeodhar requested review from benbardin, cthumuluru-crdb and a team December 9, 2024 23:42
Copy link
Collaborator

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@rimadeodhar
Copy link
Collaborator Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Dec 10, 2024
136648: rpc: reuse gRPC streams across unary BatchRequest RPCs r=tbg a=nvanbenschoten

Closes #136572.

This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke).

The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand.

The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is.

### Microbenchmarks:
```
name                                            old time/op    new time/op    delta
Sysbench/KV/1node_remote/oltp_point_select-10     45.9µs ± 1%    28.8µs ± 2%  -37.31%  (p=0.000 n=9+8)
Sysbench/KV/1node_remote/oltp_read_only-10         958µs ± 6%     709µs ± 1%  -26.00%  (p=0.000 n=9+9)
Sysbench/SQL/1node_remote/oltp_read_only-10       3.65ms ± 6%    2.81ms ± 8%  -23.06%  (p=0.000 n=8+9)
Sysbench/KV/1node_remote/oltp_read_write-10       1.77ms ± 5%    1.38ms ± 1%  -22.09%  (p=0.000 n=10+8)
Sysbench/KV/1node_remote/oltp_write_only-10        688µs ± 4%     557µs ± 1%  -19.11%  (p=0.000 n=9+9)
Sysbench/SQL/1node_remote/oltp_point_select-10     181µs ± 8%     159µs ± 2%  -12.10%  (p=0.000 n=8+9)
Sysbench/SQL/1node_remote/oltp_write_only-10      2.16ms ± 4%    1.92ms ± 3%  -11.08%  (p=0.000 n=9+9)
Sysbench/SQL/1node_remote/oltp_read_write-10      5.89ms ± 2%    5.36ms ± 1%   -8.89%  (p=0.000 n=9+9)

name                                            old alloc/op   new alloc/op   delta
Sysbench/KV/1node_remote/oltp_point_select-10     16.3kB ± 0%     6.4kB ± 0%  -60.70%  (p=0.000 n=8+10)
Sysbench/KV/1node_remote/oltp_write_only-10        359kB ± 1%     256kB ± 1%  -28.92%  (p=0.000 n=10+10)
Sysbench/SQL/1node_remote/oltp_write_only-10       748kB ± 0%     548kB ± 1%  -26.78%  (p=0.000 n=8+10)
Sysbench/SQL/1node_remote/oltp_point_select-10    40.9kB ± 0%    30.8kB ± 0%  -24.74%  (p=0.000 n=9+10)
Sysbench/KV/1node_remote/oltp_read_write-10       1.11MB ± 1%    0.88MB ± 1%  -21.17%  (p=0.000 n=9+10)
Sysbench/SQL/1node_remote/oltp_read_write-10      2.00MB ± 0%    1.65MB ± 0%  -17.60%  (p=0.000 n=9+10)
Sysbench/KV/1node_remote/oltp_read_only-10         790kB ± 0%     655kB ± 0%  -17.11%  (p=0.000 n=9+9)
Sysbench/SQL/1node_remote/oltp_read_only-10       1.33MB ± 0%    1.19MB ± 0%  -10.97%  (p=0.000 n=10+9)

name                                            old allocs/op  new allocs/op  delta
Sysbench/KV/1node_remote/oltp_point_select-10        210 ± 0%        61 ± 0%  -70.95%  (p=0.000 n=10+10)
Sysbench/KV/1node_remote/oltp_read_only-10         3.98k ± 0%     1.88k ± 0%  -52.68%  (p=0.019 n=6+8)
Sysbench/KV/1node_remote/oltp_read_write-10        7.10k ± 0%     3.47k ± 0%  -51.07%  (p=0.000 n=10+9)
Sysbench/KV/1node_remote/oltp_write_only-10        3.10k ± 0%     1.58k ± 0%  -48.89%  (p=0.000 n=10+9)
Sysbench/SQL/1node_remote/oltp_write_only-10       6.73k ± 0%     3.82k ± 0%  -43.30%  (p=0.000 n=10+10)
Sysbench/SQL/1node_remote/oltp_read_write-10       14.4k ± 0%      9.2k ± 0%  -36.29%  (p=0.000 n=9+10)
Sysbench/SQL/1node_remote/oltp_point_select-10       429 ± 0%       277 ± 0%  -35.46%  (p=0.000 n=9+10)
Sysbench/SQL/1node_remote/oltp_read_only-10        7.52k ± 0%     5.37k ± 0%  -28.60%  (p=0.000 n=10+10)
```

### Roachtests:
```
name                                            old queries/s  new queries/s  delta
sysbench/oltp_read_write/nodes=3/cpu=8/conc=64     17.6k ± 7%     19.2k ± 2%  +9.22%  (p=0.008 n=5+5)

name                                            old avg_ms/op  new avg_ms/op  delta
sysbench/oltp_read_write/nodes=3/cpu=8/conc=64      72.9 ± 7%      66.6 ± 2%  -8.57%  (p=0.008 n=5+5)

name                                            old p95_ms/op  new p95_ms/op  delta
sysbench/oltp_read_write/nodes=3/cpu=8/conc=64       116 ± 8%       106 ± 3%  -9.02%  (p=0.016 n=5+5)
```

### Manual tests:
Running in a similar configuration to `sysbench/oltp_read_write/nodes=3/cpu=8/conc=64`, but with a benchmarking related cluster settings (before and after) to reduce variance.
```
-- Before
Mean: 19771.03
Median: 19714.22
Standard Deviation: 282.96
Coefficient of variance: .0143

-- After
Mean: 21908.23
Median: 21923.03
Standard Deviation: 200.88
Coefficient of variance: .0091
```

----

Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the `rpc.batch_stream_pool.enabled` cluster setting.

137059: catalog/lease: deflake TestDescriptorRefreshOnRetry r=rafiss a=rafiss

The test was flaky since the background thread to refresh leases could run and cause the acquisition counts to be off.

fixes #137033
Release note: None

137067: roachtest: update mt-upgrade test owner to db-server r=rimadeodhar a=rimadeodhar

This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage.

Epic: none
Release note: none

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: rimadeodhar <[email protected]>
@rimadeodhar rimadeodhar added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 labels Dec 10, 2024
@craig
Copy link
Contributor

craig bot commented Dec 10, 2024

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Dec 10, 2024
137067: roachtest: update mt-upgrade test owner to db-server r=rimadeodhar a=rimadeodhar

This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage.

Epic: none
Release note: none

137094: roachtest: better triage schema workload flakes in backup-restore tests r=dt a=msbutler

Informs #136709

Release note: none

137107: release: advance version to 25.1.0-alpha.1 r=celiala a=cockroach-teamcity


Release note: None
Epic: None
Release justification: non-production (release infra) change.


Co-authored-by: rimadeodhar <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Justin Beaver <[email protected]>
@craig
Copy link
Contributor

craig bot commented Dec 10, 2024

This PR was included in a batch that was canceled, it will be automatically retried

@craig craig bot merged commit 46e8e0f into cockroachdb:master Dec 10, 2024
22 of 23 checks passed
Copy link

blathers-crl bot commented Dec 10, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 139bbcd to blathers/backport-release-23.2-137067: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


error creating merge commit from 139bbcd to blathers/backport-release-24.1-137067: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1.x failed. See errors above.


error creating merge commit from 139bbcd to blathers/backport-release-24.2-137067: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants