-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_layer_map
performance test failing
#10409
Comments
@VladLazar @problame The test failed because |
## Problem test_layer_map doesn't log statements and it is not clear how long they take. ## Summary of changes Do log them. ref #10409
Before extending the timeout, I'd like to understand why/when it became slower. Since it's in the |
History of failures: The only directly related commit seems to
But don't think this is very useful. |
## Problem test_layer_map doesn't log statements and it is not clear how long they take. ## Summary of changes Do log them. ref neondatabase#10409
Ok, the graph above was wrong, asked to fix it. Here is another one. Seems like it started failing 14.01 |
Bisected, and
Without it, timings in
With it
It is also very variable, in CI I've seen select count(*) taking 5s, 90s and 2m+ (when test fails). @erikgrinaker would you like to take over from here? |
Yeah, I'll take this. |
This is just because the test schedules a ton of tiny uploads, and the upload queue is quadratic in the number of operations.
In production, we limit the number of inprogress tasks to the remote concurrency limit to bound the quadratic cost, but the I'll submit a workaround. |
## Problem It it is not very clear how much time take different operations. ## Summary of changes Record more timings. ref #10409
## Problem The local filesystem backend for remote storage doesn't set a concurrency limit. While it can't/won't enforce a concurrency limit itself, this also bounds the upload queue concurrency. Some tests create thousands of uploads, which slows down the quadratic scheduling of the upload queue, and there is no point spawning that many Tokio tasks. Resolves #10409. ## Summary of changes Set a concurrency limit of 100 for the LocalFS backend. Before: `test_layer_map[release-pg17].test_query: 68.338 s` After: `test_layer_map[release-pg17].test_query: 5.209 s`
https://neon-github-public-dev.s3.amazonaws.com/reports/main/12791644069/index.html#suites/b8bb9797b578168beaf6275d4a5cca96/853734a845c3913b/
This reproduces locally.
The text was updated successfully, but these errors were encountered: