Leveled compaction support for heavy DGM #35

hisundar · 2017-06-05T02:52:34Z

The Data Greater than Memory (DGM) performance of moss store's Log-Structured Merge Arrays is limited by the speed of the merge operation during compaction.
Currently this compaction is done in a single level which comes from the default setting of 0 as the compaction threshold. This can result in heavy write amplification.
Based on discussion with @steveyen, to mitigate this situation, moss store persistence can follow this simple approach:
maxSmallSegments=3, maxBigSegments=2
Initially, maxSmallSegments=0, maxBigSegments=0
Persistence appends small segments to end of the file...
|-seg0-||-seg1-||-seg2-| (maxSmallSegments=3)
On the next round of persistence, the above 3 segments can be compacted into a new file
|====seg0===| (maxSmallSegments=0, maxBigSegments=1)
Following this further persistence rounds simply append smaller segments
|====seg0====||-seg0-||-seg1-||-seg2-|
Now the next round of persistence, only compacts the small segments making the file look as follows..
|====seg0====||...seg0...||...seg1...||...seg2...||=====seg1=====|
The rationality behind this is that compacting fewer segments would be faster than constantly rewriting the file on every delta.

Later to support efficient persistence to disk, we can adopt a simple size-tiered leveled compaction support in mossStore by splitting the Footer across multiple levels:
data-L0-0000xx.moss: most recent segments.
data-L1-0000xx.moss: segments merged from L0
data-L2-0000xx.moss: segments merged from L1
We can then size tier these on the levels to achieve good tradeoff between space, read and write efficiencies.

steveyen · 2017-06-05T14:34:50Z

Hi @hisundar,
One thought is how to have the configuration more general than maxSmallSegments and maxBigSegments, as it leads me to think you'd end up one future day adding things like maxBiggerSegments, maxBiggerThanBiggerSegments, etc. (Unless I misunderstand.)

Also, I took a look at rocksdb, and they seem to have multiple, concurrent files per level, so that's something to consider on the pros/cos of that.

hisundar · 2017-06-05T18:23:37Z

I agree @steveyen, the above is just an example. We should definitely have something like an array representing each level for a footer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leveled compaction support for heavy DGM #35

Leveled compaction support for heavy DGM #35

hisundar commented Jun 5, 2017 •

edited

Loading

steveyen commented Jun 5, 2017

hisundar commented Jun 5, 2017

Leveled compaction support for heavy DGM #35

Leveled compaction support for heavy DGM #35

Comments

hisundar commented Jun 5, 2017 • edited Loading

steveyen commented Jun 5, 2017

hisundar commented Jun 5, 2017

hisundar commented Jun 5, 2017 •

edited

Loading