feat: avoid oom snapshot #26043

praveen-influx · 2025-02-20T09:27:01Z

This PR addresses the OOM issue (or reduces the chances of running into OOM when snapshotting) by doing following main changes

defaults gen 1 duration to 1m (instead of 10m)
snapshot chunks are built lazily and
sort/dedupe step itself is done serially (i.e 1 at a time)

As an optimisation when not forcing a snapshot it aggregates up to 10m worth of chunks and writes them in parallel assumption is given it's a normal snapshot, there is enough memory to run it.

closes: #25991

praveen-influx · 2025-02-24T10:50:35Z

influxdb3/src/commands/serve.rs

@@ -181,7 +181,7 @@ pub struct Config {
    #[clap(
        long = "gen1-duration",
        env = "INFLUXDB3_GEN1_DURATION",
-        default_value = "10m",
+        default_value = "1m",


Defaulting to 1m means there are more query chunks in QueryableBuffer (10 times more), but this hasn't been an issue so far.

praveen-influx · 2025-02-24T11:00:22Z

influxdb3_write/src/write_buffer/queryable_buffer.rs


-                    for chunk in snapshot_chunks {
+                    for chunk in snapshot_chunks_iter {


This snapshot_chunks_iter produces SnapshotChunk lazily, uses the chunk to create PersistJob and then moves it to TableBuffer's snapshotting_chunks. Because there's a write lock on this buffer above, it is ok to remove the key and then add it back. Previously the snapshotting_chunks was cloned and this avoids the cloning.

praveen-influx · 2025-02-24T11:03:16Z

influxdb3_write/src/write_buffer/queryable_buffer.rs

+    persisted_files: Arc<PersistedFiles>,
+    persisted_snapshot: Arc<Mutex<PersistedSnapshot>>,
+) {
+    let iterator = PersistJobGroupedIterator::new(


This allows chunks to be grouped, since 1m gen 1 duration, it aggregates together up to 10 chunks to write a single parquet file for 10m window.

praveen-influx · 2025-02-25T18:24:06Z

influxdb3_write/src/write_buffer/queryable_buffer.rs

+    }
+
+    #[test_log::test(tokio::test)]
+    async fn test_snapshot_serially_two_tables_with_varying_throughput() {


@pauldix - this should cover the case we discussed with 2 tables receiving different amount of writes.

closes: #25991

This PR addresses the OOM issue (or reduces the chances of running into OOM when snapshotting) by doing following main changes - defaults gen 1 duration to 1m (instead of 10m) - snapshot chunks are built lazily and - sort/dedupe step itself is done serially (i.e 1 at a time) As an optimisation when _not_ forcing a snapshot it aggregates up to 10m worth of chunks and writes them in parallel assumption is given it's a normal snapshot, there is enough memory to run it. closes: #25991

- extra debug logs added - test fixes

praveen-influx force-pushed the praveen/avoid-oom-snapshot branch 2 times, most recently from 56024af to 2213e14 Compare February 24, 2025 10:38

praveen-influx changed the title ~~Praveen/avoid oom snapshot~~ feat: avoid oom snapshot Feb 24, 2025

praveen-influx commented Feb 24, 2025

View reviewed changes

praveen-influx force-pushed the praveen/avoid-oom-snapshot branch 11 times, most recently from 3a4a9ab to d85460b Compare February 25, 2025 18:23

praveen-influx commented Feb 25, 2025

View reviewed changes

praveen-influx force-pushed the praveen/avoid-oom-snapshot branch 4 times, most recently from 9aa8fdf to 0760147 Compare March 3, 2025 12:39

praveen-influx added 7 commits March 5, 2025 16:09

feat: break down persist jobs

536c70e

closes: #25991

chore: more debug logs

4b15108

feat: produce snapshot chunks lazily

2eaffc2

chore: all tests passing with perf improvs

2c752b2

feat: uses ensure_schema as discussed

8d268aa

- extra debug logs added - test fixes

refactor: constrain only the parallel runs

19c29ab

praveen-influx force-pushed the praveen/avoid-oom-snapshot branch from 0760147 to 19c29ab Compare March 5, 2025 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: avoid oom snapshot #26043

feat: avoid oom snapshot #26043

praveen-influx commented Feb 20, 2025 •

edited

Loading

praveen-influx Feb 24, 2025

praveen-influx Feb 24, 2025

praveen-influx Feb 24, 2025

praveen-influx Feb 25, 2025


		for chunk in snapshot_chunks {
		for chunk in snapshot_chunks_iter {

feat: avoid oom snapshot #26043

Are you sure you want to change the base?

feat: avoid oom snapshot #26043

Conversation

praveen-influx commented Feb 20, 2025 • edited Loading

praveen-influx Feb 24, 2025

Choose a reason for hiding this comment

praveen-influx Feb 24, 2025

Choose a reason for hiding this comment

praveen-influx Feb 24, 2025

Choose a reason for hiding this comment

praveen-influx Feb 25, 2025

Choose a reason for hiding this comment

praveen-influx commented Feb 20, 2025 •

edited

Loading