-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist pg_stat information in pageserver #6560
base: main
Are you sure you want to change the base?
Conversation
7425 tests run: 7066 passed, 1 failed, 358 skipped (full report)Failures on Postgres 17
Flaky tests (7)Postgres 17
Postgres 15
Test coverage report is not availableThe comment gets automatically updated with the latest test results
bccfe9a at 2025-01-30T21:24:53.989Z :recycle: |
036e235
to
1b1fbf1
Compare
Thanks for the ping. Can we enforce a size limit of this WAL write on the postgres side? (i.e. check I'm thinking that since pgstat file is meant to be small, a limit shouldn't affect it, but it will protect us in case this generic "write any file to the WAL" function is misused in future. |
Just check how large statistic file can be: I created 10000 tables, perform some manipulations with them and do |
To factor in the issues we've had with capacity/replay scaling, what do you think about having separate stores for "big values" (such as pgstats) and "small values" (such as logical replication state)? Whatever the implementation on the pageserver evolves to, I think it will be helpful to store the ~< 8KiB items separately from the ~<4MiB items. This doesn't have to be a whole separate API, we just need some recognizable bit that the pageserver can use to store them separately. |
Cross referencing with #6874 (review) -- since that PR is writing image values regularly for the AuxFilesDirectory structure, it's important that we don't put the multi-megabyte stats value in the same structure, to avoid using unreasonable amounts of storage when logical replication is used on small tenants. The scenario described in that PR was 70,000 writes, so if we were writing a 4MB image every 1000 writes, that would be 280MB of storage for a tenant that could be as small as 30MB -- the rule of thumb I'm using here is that logical replication metadata should never account for the majority of storage in a tenant. |
pgstat information is saved only on systems shutdown and unlike snapshots files it is the single key. Please also notice that:
|
It doesn't matter how often the compute is restarted, if its large stats object is written to the same AuxFileDirectory as all the logical replication keys: every 1024 writes to LR slots will trigger a large (>4MiB) image value being written.
Let's design defensively so that we are not relying on good luck to avoid a situation where someone creates a lot of relations (big stats) and the enables logical replication (frequent updates).
dbdir is indeed an issue, but the dbdir key isn't re-written continuously as a result of logical replication slot updates, and each value in the dbdir is only a couple of bytes. The difference is that the AuxFilesDirectory is subject to frequent, continuous writes, so we must not put anything big in that structure because the frequency of writes will lead to repetition of that big object. |
Yes, it is true. Size of each hash map element is only 9 bytes. But still storage size of creation of 10k relations is 4.4Gb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM for the storage part of the change, but for now, AUX file v2 is not enabled for all regions, so I don't plan to get this merged for now, unless there could be a flag to control whether or not to enable this feature.
I see the Postgres-side of the pull request writes pg_stat data during shutdown checkpoint, so I assume that's not a lot of write traffic and page server should be able to handle that relatively easily. Please correct me if I'm wrong :)
Also, I didn't see pg_stat-related files getting encoded in the aux file v2 mapping? What are the paths of the files produced by pg_stat, and probably we need to assign a prefix ID to that? (see encode_aux_file_key
in aux_file.rs
)
Yes, pgstat data is written on shutdown checkpoint.
Sorry, I missed it. |
229076b
to
ddeb08a
Compare
f6b620b
to
bf5c7fb
Compare
Extracted from #6560, currently we include multiple copies of aux files in the basebackup. ## Summary of changes Fix the loop. Signed-off-by: Alex Chi Z <[email protected]> Co-authored-by: Konstantin Knizhnik <[email protected]>
Extracted from #6560, currently we include multiple copies of aux files in the basebackup. ## Summary of changes Fix the loop. Signed-off-by: Alex Chi Z <[email protected]> Co-authored-by: Konstantin Knizhnik <[email protected]>
Co-authored-by: Heikki Linnakangas <[email protected]>
This is for PR #6560. It implements the scheme I envisioned at #6560 (comment)
…o list of expected errors
Co-authored-by: Heikki Linnakangas <[email protected]>
8c9430c
to
a48aa90
Compare
Problem
Statistic is saved in local file and so lost on compute restart.
Persist in in page server using the same AUX file mechanism used for replication slots
See more about motivation in https://neondb.slack.com/archives/C04DGM6SMTM/p1703077676522789
Summary of changes
Persist postal file using AUX mechanism
Postgres PRs:
neondatabase/postgres#547
neondatabase/postgres#446
neondatabase/postgres#445
Related to #6684 and #6228