Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: safer manual migration API #10540

Open
jcsp opened this issue Jan 28, 2025 · 2 comments
Open

storcon: safer manual migration API #10540

jcsp opened this issue Jan 28, 2025 · 2 comments
Assignees
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests

Comments

@jcsp
Copy link
Collaborator

jcsp commented Jan 28, 2025

It's too easy to make mistakes like:

  • Migrating tenant somewhere the optimiser will immediately move it away from (e.g. wrong AZ, same pageserver as too many other shards in the same tenant
  • Cutting over to a secondary that isn't warm enough, so that there are lots of on-demand downloads after cutting over.
  • Cutting over to somewhere you don't have a secondary at all

Storage controller already knows how to do pretty safe migrations when the scheduling optimiser drives it, we should enable humans to access the same routine.

We could have a migration API that:

  • By default, refuses to migrate somewhere that the optimiser would disagree with, and requires a --force to override that.
  • Orchestrates migration by setting a "preferred" pageserver on a shard, and then letting optimize_attachment do its thing (i.e. creating a secondary, waiting for it to warm up, cutting over, removing old secondary).
  • Provides some handy way to monitor progress for a human, e.g. call into a per-shard API that tells you how warm each secondary is. (via storcon_cli
@jcsp jcsp added c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests labels Jan 28, 2025
@jcsp
Copy link
Collaborator Author

jcsp commented Jan 28, 2025

(in case of drains during deploy, we already do not live migrate something if its secondary isn't suitably warm)

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 28, 2025

could add preconditions to the API like "I think it's attached at X and Y, please migrate...", so that automation can have greater faith that they're doing what they think they're doing.

  • for current migration API refuse request if the source isn't what the request specifies as the source.

@jcsp jcsp self-assigned this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

No branches or pull requests

1 participant