Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: improve resource rebalancing across Pageservers #10488

Open
erikgrinaker opened this issue Jan 23, 2025 · 0 comments
Open

storcon: improve resource rebalancing across Pageservers #10488

erikgrinaker opened this issue Jan 23, 2025 · 0 comments
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests

Comments

@erikgrinaker
Copy link
Contributor

During e.g. releases, tenant shards are moved around as Pageservers restart. This can lead to load imbalances, where some Pageservers have much higher load than others, and can see resource exhaustion. This can also happen gradually as tenant workloads change.

Currently, such imbalances must be resolved manually, by moving tenants around. This is slow, laborious, and often doesn't happen at all.

The storage controller should monitor Pageserver resource usage and attempt to balance resource usage and avoid overload across the following dimensions:

  • CPU usage
  • Memory usage
  • Disk IOPs
  • Disk bandwidth
  • Disk space
  • Tenant counts
  • GetPage request rate
  • WAL ingestion rate

This must be combined with other constraints such as AZ affinity.

@erikgrinaker erikgrinaker added c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests labels Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

No branches or pull requests

1 participant