-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: update mt-upgrade test owner to db-server #137067
Conversation
This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage. Epic: none Release note: none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
TFTR! bors r+ |
136648: rpc: reuse gRPC streams across unary BatchRequest RPCs r=tbg a=nvanbenschoten Closes #136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. ### Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` ### Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` ### Manual tests: Running in a similar configuration to `sysbench/oltp_read_write/nodes=3/cpu=8/conc=64`, but with a benchmarking related cluster settings (before and after) to reduce variance. ``` -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` ---- Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the `rpc.batch_stream_pool.enabled` cluster setting. 137059: catalog/lease: deflake TestDescriptorRefreshOnRetry r=rafiss a=rafiss The test was flaky since the background thread to refresh leases could run and cause the acquisition counts to be off. fixes #137033 Release note: None 137067: roachtest: update mt-upgrade test owner to db-server r=rimadeodhar a=rimadeodhar This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage. Epic: none Release note: none Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: rimadeodhar <[email protected]>
Build failed (retrying...): |
137067: roachtest: update mt-upgrade test owner to db-server r=rimadeodhar a=rimadeodhar This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to `t-db-server` for triage. Epic: none Release note: none 137094: roachtest: better triage schema workload flakes in backup-restore tests r=dt a=msbutler Informs #136709 Release note: none 137107: release: advance version to 25.1.0-alpha.1 r=celiala a=cockroach-teamcity Release note: None Epic: None Release justification: non-production (release infra) change. Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Justin Beaver <[email protected]>
This PR was included in a batch that was canceled, it will be automatically retried |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 139bbcd to blathers/backport-release-23.2-137067: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. error creating merge commit from 139bbcd to blathers/backport-release-24.1-137067: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.1.x failed. See errors above. error creating merge commit from 139bbcd to blathers/backport-release-24.2-137067: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This PR updates the test ownership for the multitenant-upgrade test to the DB Server team. All future test failures will be routed to
t-db-server
for triage.Epic: none
Release note: none