-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Optimize replication on target tablets #17166
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
c36e603
to
4371f80
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17166 +/- ##
==========================================
+ Coverage 67.39% 67.41% +0.02%
==========================================
Files 1570 1570
Lines 252892 252887 -5
==========================================
+ Hits 170437 170486 +49
+ Misses 82455 82401 -54 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Matt Lord <[email protected]>
018c936
to
ac27fd5
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
@@ -54,6 +54,7 @@ vttablet \ | |||
--service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \ | |||
--pid_file $VTDATAROOT/$tablet_dir/vttablet.pid \ | |||
--heartbeat_on_demand_duration=5s \ | |||
--pprof-http \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will be reverted before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about it more, I'd like to keep this enabled for the examples. Seems like a good case for it. I can remove/revert the change though if others disagree. If we do keep it here, I could also add it for vtgate and vtctld.
Signed-off-by: Matt Lord <[email protected]>
a237695
to
2e3334f
Compare
@@ -171,7 +171,7 @@ func (vc *vdbClient) Execute(query string) (*sqltypes.Result, error) { | |||
func (vc *vdbClient) ExecuteWithRetry(ctx context.Context, query string) (*sqltypes.Result, error) { | |||
qr, err := vc.Execute(query) | |||
for err != nil { | |||
if sqlErr, ok := err.(*sqlerror.SQLError); ok && sqlErr.Number() == sqlerror.ERLockDeadlock || sqlErr.Number() == sqlerror.ERLockWaitTimeout { | |||
if sqlErr, ok := err.(*sqlerror.SQLError); ok && (sqlErr.Number() == sqlerror.ERLockDeadlock || sqlErr.Number() == sqlerror.ERLockWaitTimeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated, but I noticed that it was not correct as I was doing testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a separate PR for this since it's worth a quick backport to v19 to prevent a panic if the error we got back from the query was NOT an SQL error.
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
@@ -353,6 +353,10 @@ message FieldEvent { | |||
repeated query.Field fields = 2; | |||
string keyspace = 3; | |||
string shard = 4; | |||
|
|||
// Field numbers in the gap between shard (4) and enum_set_string_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments in this file are unrelated to this PR but I wanted to get them into main after the accidental field number gap came up in a discussion.
Putting back in Draft to investigate test failures that seemingly came out of nowhere.... |
This impacted e2e tests that were not previously using VPlayerBatching. The load generator constantly generates INSERTs, which are then effeciently batched in vplayer so we get ~ 7x more throughput than before and thus we need more time for filtered replication to catch up after we've stopped it for the vdiff. ESPECIALLY since we're using the --update-table-stats flag and the ANALYZE TABLE and its locking causes a pause in updates to the table the load generator is inserting into -- in particular for the test clusters that only have PRIMARY tablets as everything is interacting directly. Signed-off-by: Matt Lord <[email protected]>
46d733b
to
f0f61db
Compare
@@ -35,7 +35,7 @@ import ( | |||
) | |||
|
|||
const ( | |||
vdiffTimeout = 120 * time.Second // We can leverage auto retry on error with this longer-than-usual timeout | |||
vdiffTimeout = 180 * time.Second // We can leverage auto retry on error with this longer-than-usual timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the reason for the vdiff test changes in the commit message: f0f61db
It failed for some materializations Signed-off-by: Matt Lord <[email protected]>
6ff6f84
to
21564fd
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Description
This PR makes two performance optimizations affecting execution time and resource usage:
copy
phase: optimize the generation of row values that are used to build bulkINSERT
statementsrunning
/replicating
phase: enable the VPlayerBatching feature by default. This was added in v18 in VReplication VPlayer: support statement and transaction batching #14502. It has been enabled in a number of relevant endtoend tests ever since and it has been used in PlanetScale for a number of important workflows. Enabling it early in the v22 lifecycle also gives us a good 3-6 months for it to bake further on main.Summary:
The detailed results of the testing can be seen here: https://gist.github.com/mattlord/d6161a1c27f203a497b7bd6a69e7c467
The test setup was created this way:
Related Issue(s)
Checklist