Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: improve read amp metric #10573

Merged
merged 1 commit into from
Jan 30, 2025
Merged

Conversation

erikgrinaker
Copy link
Contributor

Problem

The current global pageserver_layers_visited_per_vectored_read_global metric does not appear to accurately measure read amplification. It divides the layer count by the number of reads in a batch, but this means that e.g. 10 reads with 100 L0 layers will only measure a read amp of 10 per read, while the actual read amp was 100.

While the cost of layer visits are amortized across the batch, and some layers may not intersect with a given key, each visited layer contributes directly to the observed latency for every read in the batch, which is what we care about.

Touches https://github.com/neondatabase/cloud/issues/23283.
Extracted from #10566.

Summary of changes

  • Count the number of layers visited towards each read in the batch, instead of the average across the batch.
  • Rename pageserver_layers_visited_per_vectored_read_global to pageserver_layers_per_read_global.
  • Reduce the read amp log warning threshold down from 512 to 100.

Copy link

7414 tests run: 7063 passed, 0 failed, 351 skipped (full report)


Flaky tests (7)

Postgres 17

Postgres 14

Code coverage* (full report)

  • functions: 33.4% (8508 of 25499 functions)
  • lines: 49.1% (71435 of 145499 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
7cfd38c at 2025-01-30T00:15:57.405Z :recycle:

@erikgrinaker erikgrinaker added this pull request to the merge queue Jan 30, 2025
Merged via the queue into main with commit b247271 Jan 30, 2025
86 checks passed
@erikgrinaker erikgrinaker deleted the erik/layers-per-read-global branch January 30, 2025 09:35
github-merge-queue bot pushed a commit that referenced this pull request Jan 30, 2025
## Problem

We suspect that Postgres checkpoints will limit the number of page
deltas necessary to reconstruct a page, but don't know for certain.

Touches neondatabase/cloud#23283.

## Summary of changes

Add `pageserver_deltas_per_read_global` metric.

This pairs with `pageserver_layers_per_read_global` from #10573.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants