Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: support 1 million relations #9516

Open
13 tasks
jcsp opened this issue Oct 25, 2024 · 0 comments
Open
13 tasks

pageserver: support 1 million relations #9516

jcsp opened this issue Oct 25, 2024 · 0 comments
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests

Comments

@jcsp
Copy link
Collaborator

jcsp commented Oct 25, 2024

We do not currently define a maximum number of relations that we support, but it is known that beyond about 10k relations things get dicey. The exact number of issues is unknown, but the primary architectural issue is how we store RelDirectory as a monlithic blob that gets rewritten whenever we add/remove one.

Postgres itself does not define a practical limit on relations per database: the hard limit is approximately one billion, but it is well known that the practical limit is much lower, and dependent on hardware+config:

To pick an arbitrary but realistic goal, let's support+test 1 million tables. This is realistic because:

  • Something like an array of relation sizes is only single digit megabytes with a million tables (whereas with a billion tables, such structures would likely need to be disk-based rather than simple in-memory structures)
  • If we the can create a few thousand tables per second, then a test that creates a million tables can run in minutes, not hours (i.e. within the envelope of what our CI supports)

A tiny initial step in this direction is #9507, which adds a test that creates 8000 tables (not very many!) to reproduce a specific scaling bug in transaction aborts. That test currently has a relatively long runtime (tens of seconds) because our code for tracking timeline metadata is still very inefficient.

The goal is to make it work "fast enough", in the sense that a database is usable and things don't time out, but not necessarily to implement every possible optimisation. For example, logical size calculations will be expensive with 1 million relations (requiring many megabytes of reads from storage), and that is okay as long as the expense does not cause the system to fail from the user's point of view.

  • Replace RelDir with a more scalable alternative, such as using a sparse keyspace a la aux files
  • Instead of interleaving relation sizes with data, store them in their own region of the keyspace (or even combine them into the new storage for RelDir). Eliminate imitate_logical_size, and update eviction logic to intentionally retain recent layers rather than accidentally through logical size calculation
  • Test creating and dropping 1 million relations with this more scalable store. We should see that runtime is reasonable (minutes), and memory+disk space used is O(N) with table count (not O(N^2))
  • Implement code for rewriting metadata to the new format on startup. This should run very early during startup so that no other parts of the code need to understand the old format: we can then maintain this for a long time. will have two read paths instead.
  • Automated tests that have a snapshot of an old-style tenant and use it to continuously check that our latest code can still load such a tenant (extend test_historic_storage_formats)
  • Asymptotic scaling: measure scaling of relation creation/drop & compute startup with relation count, and ensure it is at most O(N)
  • A much broader test for high relation counts: ensure that compaction runtimes are as expected, that logical replication works as expected. This will need writing in collaboration with the compute team to cover the appropriate gamut of postgres features. If this encounters unforseen issues, then spin these off into other github issues: this Epic is for the storage support for high table counts.
  • pageserver: slow get_rel_exists() during WAL ingestion with many relations #9855
  • Benchmark creating many tables and databases #9986

Out of scope:

  • High database counts (Neon cloud already limits databases per project to 500 by default)
  • Revising pg_stat (Persist pg_stat information in pageserver #6560 ) code to handle large relation counts (current code skips writing pg_stat if the snapshot exceeds a size threshold)
  • Any postgres CLI/tooling issues around high relation counts
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests labels Oct 25, 2024
@skyzh skyzh self-assigned this Jan 6, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 13, 2025
## Problem

In preparation to #9516. We
need to store rel size and directory data in the sparse keyspace, but it
does not support inheritance yet.

## Summary of changes

Add a new type of keyspace "sparse but inherited" into the system.

On the read path: we don't remove the key range when we descend into the
ancestor. The search will stop when (1) the full key range is covered by
image layers (which has already been implemented before), or (2) we
reach the end of the ancestor chain.

---------

Signed-off-by: Alex Chi Z <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Jan 20, 2025
## Problem

Part of #9516 per RFC at #10412

## Summary of changes

Adding the necessary config items and index_part items for the large
relation count work.

---------

Signed-off-by: Alex Chi Z <[email protected]>
awarus pushed a commit that referenced this issue Jan 24, 2025
## Problem

Part of #9516 per RFC at #10412

## Summary of changes

Adding the necessary config items and index_part items for the large
relation count work.

---------

Signed-off-by: Alex Chi Z <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

No branches or pull requests

2 participants