Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: VTOrc in unable to detect errant GTIDs on a recently demoted primary #17254

Closed
GuptaManan100 opened this issue Nov 19, 2024 · 0 comments · Fixed by #17267
Closed
Labels
Component: VTorc Vitess Orchestrator integration Type: Bug

Comments

@GuptaManan100
Copy link
Member

Overview of the Issue

We recently noticed the following order of steps happening -

  1. A primary tablet, lets say A is running.
  2. Something goes wrong, and an ERS is triggered (Could be a network partition or whatever)
  3. Tablet A ends up with an extra errant GTID that the new primary doesn't have.
  4. VTOrc detects that A is not connected to any primary, (it doesn't do an errant GTID detection because there is no primary) and tries to call SetReplicationSource.
  5. This call to set replication source fails, because the vttablet sees it has an errant GTID.

The problem is that VTOrc only knows about errant gtids after a tablet is replicating from another tablet, but vttablets run the detection when they are starting replication. This means that for a tablet with errant gtid not replicating from any tablet, VTOrc is unable to run the errant gtid detected recovery.

Reproduction Steps

Described above

Binary Version

v21 and main

Operating System and Environment details

-

Log Fragments

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTorc Vitess Orchestrator integration Type: Bug
Projects
None yet
1 participant