-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: use strict-obsolete sstables when writing to and reading from shared storage #2756
Comments
Previously we were using pointCollapsingIters to collapse obsolete points in foreign sstables. This was necessary because of snapshots that were open at the time those sstables were written, however they really blew up code complexity and performance. This change transitions to using vanilla sstable iterators for foreign sstables with `hideObsoletePoints` set to true. We error out if the sstable format for a foreign sstable is not high enough to support the obsolete flag. Also remove all cases in the pointCollapsingIter that were not used by ScanInternal (which is the last remaining user of it). Part of cockroachdb#2756.
Using strict-obsolete SSTs for the write path is a little more complex as Pebble is currently unaware of what part of the keyspace is shareable at write time. Maybe we can thread this through to Pebble as an Option. and we mark sstables that are completely inside the strict-obsolete keyspace as strict obsolete when writing to them. We currently also put timeseries keys in shared storage which will contain merges but that's okay because we will never share them across pebbles. |
Previously we were using pointCollapsingIters to collapse obsolete points in foreign sstables. This was necessary because of snapshots that were open at the time those sstables were written, however they really blew up code complexity and performance. This change transitions to using vanilla sstable iterators for foreign sstables with `hideObsoletePoints` set to true. We error out if the sstable format for a foreign sstable is not high enough to support the obsolete flag. Also remove all cases in the pointCollapsingIter that were not used by ScanInternal (which is the last remaining user of it). Part of cockroachdb#2756.
Previously we were using pointCollapsingIters to collapse obsolete points in foreign sstables. This was necessary because of snapshots that were open at the time those sstables were written, however they really blew up code complexity and performance. This change transitions to using vanilla sstable iterators for foreign sstables with `hideObsoletePoints` set to true. We error out if the sstable format for a foreign sstable is not high enough to support the obsolete flag. Also remove all cases in the pointCollapsingIter that were not used by ScanInternal (which is the last remaining user of it). Part of #2756.
Based on discussion in Storage weekly, the preference was to add compaction guards around the Table keyspace as the "shareable keyspace", which would be passed into Pebble as an Option. Sstable outputs will be split at these boundaries and any sstables created within this range will be strict-obsolete. This issue now tracks the implementation of that. |
…tions SharedLowerUserKeyPrefix, if specified, is an additional lower bound on constraint on key prefixes that should be written to shared files. It applies only when CreateOnShared permits some shared file creation. It will be used by CockroachDB to exclude keys below TableDataMin from shared files, for both correctness (they can contain MERGEs for which the obsolete bit does not work) and performance reasons (low latency is more important and the data volume is tiny). WriteSharedWithStrictObsolete, when true, causes shared files to be written with WriterOptions.IsStrictObsolete set to true. This adds an extra measure of configuration protection to accidentally sharing files where the MERGE could become visible (we currently share such files, but file virtualization hides these MERGEs). The enforcement of SharedLowerUserKeyPrefix changes how PreferSharedStorage is computed during flushes and compactions. It will only be set if the next key to be written by compact.Runner permits writing to shared storage. compact.Runner optimizes this computation for when a compaction is fully within the shared or unshared bounds. Additionally compact.Runner uses the existing OutputSplitter to decide when a split should happen when transitioning from unshared to shared. While here, we do a tiny optimization in OutputSplitter to remove a key comparison on each iteration key. Fixes cockroachdb#2756
…tions SharedLowerUserKeyPrefix, if specified, is an additional lower bound on constraint on key prefixes that should be written to shared files. It applies only when CreateOnShared permits some shared file creation. It will be used by CockroachDB to exclude keys below TableDataMin from shared files, for both correctness (they can contain MERGEs for which the obsolete bit does not work) and performance reasons (low latency is more important and the data volume is tiny). WriteSharedWithStrictObsolete, when true, causes shared files to be written with WriterOptions.IsStrictObsolete set to true. This adds an extra measure of configuration protection to accidentally sharing files where the MERGE could become visible (we currently share such files, but file virtualization hides these MERGEs). The enforcement of SharedLowerUserKeyPrefix changes how PreferSharedStorage is computed during flushes and compactions. It will only be set if the next key to be written by compact.Runner permits writing to shared storage. compact.Runner optimizes this computation for when a compaction is fully within the shared or unshared bounds. Additionally compact.Runner uses the existing OutputSplitter to decide when a split should happen when transitioning from unshared to shared. While here, we do a tiny optimization in OutputSplitter to remove a key comparison on each iteration key. Fixes cockroachdb#2756
These were introduced recently, and permit using the vanilla sstable readers when reading foreign ssts, which is more efficient than wrappers. Documented in
pebble/sstable/format.go
Lines 32 to 181 in 456a2a2
@itsbilal
Jira issue: PEBBLE-60
Epic CRDB-40358
The text was updated successfully, but these errors were encountered: