-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable on-demand download in WalIngest. #3233
Conversation
34ba28d
to
1ddf626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow, recent PRs around this code feel better when pure anyhow got replaced with a few enum variant errors.
I see the approach now is to eagerly download all layers inplace.
2 thoughts on that:
-
it would be great to measure timeline download process as a whole, from all such inplace downloads querying different (?) layers simultaneously.
Otherwise, not sure how to assert the efficiency of the approach(es) taken. -
After this PR, only non-test code module that uses
.no_ondemand_download()
is pgdatadir_mapping.rs, and even there there's already 2with_ondemand_download
calls.
I see that recent big PR changes the same module: https://github.com/neondatabase/neon/pull/3228/files#diff-f290f8e7705b225e1704154be2f32b09b36ad95d82f58c5139e04f263bf5a3a6
Without deeply looking in the other PR, I can already say that inplace download approach + async conversion for pgdatadir_mapping.rs
is a great standalone PR, that hopefully strips a few hundred lines off the big one. WDYT?
d12e62c
to
2ce6692
Compare
Hmm, yeah, some kind of a metric or observability would be nice. Got any concrete ideas on how and where to do that?
Yeah, this PR is really a subset of #3228 too. I agree it makes sense to do |
2ce6692
to
33b6121
Compare
33b6121
to
58164f6
Compare
58164f6
to
cfb167f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior reviews are already sufficient. Quickly went through the diff to ensure that no new no_ondemand_download
calls slipped in.
Also, I like that we get rid of lots of try_prr!
wrinkles.
Makes the top-level functions in WalIngest async, and replaces no_ondemand_download calls with with_ondemand_download. This hopefully fixes the problem reported in issue #3230, although I don't have a self-contained test case for it.
cfb167f
to
4441d60
Compare
Test failure because of existing flakiness. I'll re-run it. |
This makes Timeline::get() async, and all functions that call it directly or indirectly with it. The with_ondemand_download() mechanism is gone, Timeline::get() now always downloads files, whether you want it or not. That is what all the current callers want, so even though this loses the capability to get a page only if it's already in the pageserver, without downloading, we were not using that capability. There were some places that used 'no_ondemand_download' in the WAL ingestion code that would error out if a layer file was not found locally, but those were dubious. We do actually want to on-demand download in all of those places. Per discussion at #3233 (comment)
This makes Timeline::get() async, and all functions that call it directly or indirectly with it. The with_ondemand_download() mechanism is gone, Timeline::get() now always downloads files, whether you want it or not. That is what all the current callers want, so even though this loses the capability to get a page only if it's already in the pageserver, without downloading, we were not using that capability. There were some places that used 'no_ondemand_download' in the WAL ingestion code that would error out if a layer file was not found locally, but those were dubious. We do actually want to on-demand download in all of those places. Per discussion at #3233 (comment)
Makes the top-level functions in WalIngest async, and replaces no_ondemand_download calls with with_ondemand_download.
This hopefully fixes the problem reported in issue #3230, although I don't have a self-contained test case for it.
Note: I created this PR over PR #3202, because this ran into problems with trying to hold the sync Tar reference over awaits.