Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: access data via an interface in vtorc #17190

Open
timvaillancourt opened this issue Nov 8, 2024 · 5 comments
Open

RFC: access data via an interface in vtorc #17190

timvaillancourt opened this issue Nov 8, 2024 · 5 comments
Assignees
Labels
Component: VTorc Vitess Orchestrator integration Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Internal Cleanup

Comments

@timvaillancourt
Copy link
Contributor

timvaillancourt commented Nov 8, 2024

Feature Description

This RFC proposes a new golang interface is added to go/vt/vtorc/db to allow:

  1. Easier mocking of the backend store, for tests, etc
  2. The ability to use a different implementation
  3. A full separation of VTOrc logic vs. data access - right now go/vt/vtorc/logic has some database logic in it

The move to this would involve:

  1. Defining the new interface, I'm thinking type DB interface
  2. Make the existing logic an implementation of the new interface
  3. Update tests

I've done a quick scan over the code, and I probably missed something, but here is a rough draft interface that should support the existing code. There are some small tweaks such as:

  1. Moving long signatures to opts structs
  2. Using real *topodatapb.TabletAlias vs the opinionated tabletAlias string
  3. Adding ctx context.Context to everything

cc @GuptaManan100 for thoughts 🙇

Rough draft:

type DB interface {
	// Discovery
	DeleteDiscoveredTablets(ctx context.Context) error
	GetShardPrimary(ctx context.Context, keyspace string, shard string) (*topodatapb.Tablet, error)
	GetTabletAliasesByCell(ctx context.Context) ([]*topodatapb.TabletAlias, error)
	GetTabletAliasesByKeyspaceShard(ctx context.Context) ([]*topodatapb.TabletAlias, error)

	// Detection
	AttemptFailureDetectionRegistration(ctx context.Context, analysisEntry *inst.ReplicationAnalysis) (bool, error)
	ClearActiveFailureDetections(ctx context.Context) error

	// Analysis
	AuditInstanceAnalysisInChangelog(ctx context.Context, tabletAlias *topodatapb.TabletAlias, analysisCode AnalysisCode) error
	ExpireAuditData(ctx context.Context) error
	ExpireInstanceAnalysisChangelog(ctx context.Context) error
	GetReplicationAnalysis(ctx context.Context, keyspace string, shard string, hints *inst.ReplicationAnalysisHints) ([]*inst.ReplicationAnalysis, error)

	// Audit
	AuditOperation(ctx context.Context, opts *AuditOperationOpts) error

	// Instance
	ExpireStaleInstanceBinlogCoordinates(ctx context.Context) error
	ForgetInstance(ctx context.Context, tabletAlias *topodatapb.TabletAlias) error
	ForgetLongUnseenInstances(ctx context.Context) error
	GetKeyspaceShardName(ctx context.Context, tabletAlias *topodatapb.TabletAlias) (string, string, error) {
	ReadInstanceClusterAttributes(ctx context.Context, primaryHost string, primaryPort int) (*inst.Instance, error)
	ReadInstance(ctx context.Context, tabletAlias *topodatapb.TabletAlias) (*inst.Instance, bool, error) 
	ReadReplicaInstances(ctx context.Context, opts *ReadReplicaInstancesOpts) ([]*inst.Instance, error)
	ReadProblemInstances(ctx context.Context, keyspace string, shard string) ([]*inst.Instance, error)
	ReadOutdatedInstanceKeys(ctx context.Context) ([]*topodatapb.TabletAlias, error)
	ReadInstancesWithErrantGTIDs(ctx context.Context, keyspace string, shard string) ([]*inst.Instance, error)
	RecordStaleInstanceBinlogCoordinates(ctx context.Context, tabletAlias *topodatapb.TabletAlias, binlogCoordinates *inst.BinlogCoordinates) error
	SnapshotTopologies(ctx context.Context) error
	UpdateInstanceLastAttemptedCheck(ctx context.Context, tabletAlias *topodatapb.TabletAlias)
	UpdateInstanceLastChecked(ctx context.Context, tabletAlias *topodatapb.TabletAlias, partialSuccess bool) error
	WriteInstances(ctx context.Context, instances []*inst.Instance, instanceWasActuallyFound, updateLastSeen bool) error

	// Keyspace
	ReadKeyspace(ctx context.Context, keyspace string) (*topo.KeyspaceInfo, error)
	SaveKeyspace(ctx context.Context, keyspace *topo.KeyspaceInfo) error
	GetDurabilityPolicy(ctx context.Context, keyspace string) (reparentutil.Durabler, error)

	// Shard
	ReadShardPrimaryInformation(ctx context.Context, keyspaceName, shardName string) (*topodatapb.TabletAlias, string, error)
	SaveShard(ctx context.Context, shard *topo.ShardInfo) error

	// Tablet
	ReadTablet(ctx context.Context, tabletAlias *topodatapb.TabletAlias) (*topodatapb.Tablet, error)
	SaveTablet(ctx context.Context, tablet *topodatapb.Tablet) error

	// Recovery
	AcknowledgeRecoveries(ctx context.Context, opts *AcknowledgeRecoveriesOpts) (int64, error)
	ClearActiveRecoveries(ctx context.Context) error
	DisableRecovery(ctx context.Context) error
	EnableRecovery(ctx context.Context) error
	ExpireBlockedRecoveries(ctx context.Context) error
	ExpireRecoveries(ctx context.Context) error
	ExpireRecoverySteps(ctx context.Context) error
	IsRecoveryDisabled(ctx context.Context) (bool, error)
	ReadRecoveries(ctx context.Context, opts *ReadRecoveriesOpts) ([]*logic.TopologyRecovery, error)
	RegisterBlockedRecoveries(ctx context.Context, analysisEntry *inst.ReplicationAnalysis, blockingRecoveries []*logic.TopologyRecovery) error
	WriteResolveRecovery(ctx context.Context, topologyRecovery *logic.TopologyRecovery) error
	WriteTopologyRecovery(ctx context.Context, topologyRecovery *logic.TopologyRecovery) (*logic.TopologyRecovery, error)
	WriteTopologyRecoveryStep(ctx context.Context, topologyRecoveryStep *logic.TopologyRecoveryStep) error
}

type AcknowledgeRecoveriesOpts struct {
	Owner           string
	Comment         string
	MarkEndRecovery bool
}

type AuditOperationOpts struct {
	// TODO: define
}

type ReadRecoveriesOpts struct {
	// TODO: define
}

type ReadReplicaInstancesOpts struct {
	PrimaryHost                    string
	PrimaryPort                    int
	IncludeBinlogServerSubReplicas bool
}

Use Case(s)

Developers and indirectly users of vtorc

@timvaillancourt timvaillancourt added Type: Internal Cleanup Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTorc Vitess Orchestrator integration labels Nov 8, 2024
@timvaillancourt timvaillancourt self-assigned this Nov 8, 2024
@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Nov 8, 2024

A bit unrelated, but maybe time to fix this too:

ReadInstanceClusterAttributes(ctx context.Context, primaryHost string, primaryPort int)

and

ReadReplicaInstances(ctx context.Context, primaryHost string, primaryPort int) ([]*Instance, error)

..should just use a *topodatapb.TabletAlias of the primary, but FullStatus doesn't include it, so VTOrc doesn't have it in the backend database

Related to this, it seems we have no indices on hostname and port (but we sometimes query/join on it) - I'm on the fence if I should add that index or fix it properly now (return/store alias of the primary)

cc @GuptaManan100 for 💡s

@GuptaManan100
Copy link
Member

GuptaManan100 commented Nov 11, 2024

I like the idea of the RFC, just one thing I'd like to correct.

but FullStatus doesn't include it, so VTOrc doesn't have it in the backend database

This is not true. We have all the information that we store in a tablet record in vitess_tablet table. Look at SaveTablet for more information. We store the entire tablet record by marshaling it from proto into string. We also store individual fields that we deem worthy of reading without needing to deserialize the marshalled proto string. We store the alias as a string, and the cell in the table. If we want, we can also store the UID, and that gives us all the infomration to construct the *topodatapb.TabletAlias back. (We can still do that by unmarshaling the info field, but we can store uid explicitly too).

@GuptaManan100
Copy link
Member

A refactor that makes things more testable is always welcome! I have always thought that the way VTOrc DB code is written was a little bit hard to test. I had introduced an interface for testing before as well. It is present in db.go -

type DB interface {
	QueryVTOrc(query string, argsArray []any, onRow func(sqlutils.RowMap) error) error
}

but your proposal is way better.

@timvaillancourt
Copy link
Contributor Author

This is not true. We have all the information that we store in a tablet record in vitess_tablet table. Look at SaveTablet for more information.

@GuptaManan100 great, good to know! Another refactor I'd like to do is stop using the unindexed hostname and port columns in queries. I assume those were indexed but they aren't anymore. It seems hostname/port is used to join to the primary tablet in some cases, so I thought alias just wasn't available and we were stuck

If we JOIN on the primary alias there would be an index

@timvaillancourt
Copy link
Contributor Author

We store the alias as a string, and the cell in the table.

To be clear, I mean storing the alias of each tablet's current primary. I'll review where it's available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTorc Vitess Orchestrator integration Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Internal Cleanup
Projects
None yet
Development

No branches or pull requests

2 participants