Skip to content

Commit

Permalink
fix(pageserver): gc-compaction race with read
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Chi Z <[email protected]>
  • Loading branch information
skyzh committed Jan 28, 2025
1 parent c54cd9e commit f371186
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 0 deletions.
8 changes: 8 additions & 0 deletions pageserver/src/tenant/timeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,8 @@ pub struct Timeline {
/// so that e.g. on-demand-download/eviction, and layer spreading, can operate just on `LayerFileManager`.
pub(crate) layers: tokio::sync::RwLock<LayerManager>,

pub(crate) gc_compaction_layer_update_lock: tokio::sync::RwLock<()>,

last_freeze_at: AtomicLsn,
// Atomic would be more appropriate here.
last_freeze_ts: RwLock<Instant>,
Expand Down Expand Up @@ -2423,6 +2425,7 @@ impl Timeline {
shard_identity,
pg_version,
layers: Default::default(),
gc_compaction_layer_update_lock: tokio::sync::RwLock::new(()),

walredo_mgr,
walreceiver: Mutex::new(None),
Expand Down Expand Up @@ -3459,6 +3462,11 @@ impl Timeline {
let mut completed_keyspace = KeySpace::default();
let mut image_covered_keyspace = KeySpaceRandomAccum::new();

// The read layer guard optimization is not correct when the compaction algorithm
// merges the history of a key. To optimize for common cases while avoid correctness
// issues, we need to hold the `gc_compaction_layer_update_lock` on the read path.
let _guard = timeline.gc_compaction_layer_update_lock.read().await;

loop {
if cancel.is_cancelled() {
return Err(GetVectoredError::Cancelled);
Expand Down
13 changes: 13 additions & 0 deletions pageserver/src/tenant/timeline/compaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2915,6 +2915,19 @@ impl Timeline {
// Between the sanity check and this compaction update, there could be new layers being flushed, but it should be fine because we only
// operate on L1 layers.
{
// The read guard is used on the read path in `get_vectored_reconstruct_data_timeline`.
// Gc-compaction will consolidate some deltas into a single image. Therefore, in the
// following case, it's incorrect to read without this guard:
// 0. Gc-compaction running in the background.
// 1. A key is in 2 delta layers. The read path processes the first layer and drops the
// layers guard. Both of the layers are below the gc-horizon.
// 2. Gc-compaction finishes consolidating these two layers into a single image and updates
// the layer map. The image is at the gc-horizon.
// 3. The read path continues to read from the end LSN of the second layer, but now it cannot
// find any existing layer.
// By holding the update guard and avoid the read path to process a single read before/after
// gc-compaction, we can avoid this race condition.
let _update_guard = self.gc_compaction_layer_update_lock.write().await;
let mut guard = self.layers.write().await;
guard
.open_mut()?
Expand Down

0 comments on commit f371186

Please sign in to comment.