Bad frame decompression performance regression for large data sets in Rust 1.73 #147

bonsairobo · 2023-11-02T08:28:08Z

Both measurements were taken for lz4_flex version 0.11.1. Only bumping the Rust compiler version caused the regression.

What I find very interesting is that essentially the same regression has occurred for the lz4 crate, and it seems both regressions have only occurred for decompression of the frame format, not the block format.

I think the next step is to investigate which specific scope of code in the frame decompression has gotten slower.

Benchmark

use std::{
    fs::File,
    io::Cursor,
    path::{Path, PathBuf},
    time::Instant,
};

fn main() {
    for entry_result in std::fs::read_dir("./data").unwrap() {
        let entry = entry_result.unwrap();
        do_bench(
            "lz4_flex",
            &entry.path(),
            compress_lz4_flex,
            decompress_lz4_flex,
        );
        // do_bench("lz4", &entry.path(), compress_lz4, decompress_lz4);
    }
}

fn do_bench(
    algo_name: &str,
    input_path: &Path,
    compress_fn: impl Fn(&Path, File) -> File,
    decompress_fn: impl Fn(&[u8]) -> Vec<u8>,
) {
    let in_f = std::fs::File::open(input_path).unwrap();
    let start_size = in_f.metadata().unwrap().len();

    let out_path = PathBuf::from(format!(
        "output/out-{}.lz4",
        input_path.file_stem().unwrap().to_str().unwrap()
    ));
    let compress_start = Instant::now();

    let output = compress_fn(&out_path, in_f);

    let compress_elapsed = compress_start.elapsed().as_millis();
    let end_size = output.metadata().unwrap().len();
    println!(
        "[{algo_name} {}] Compression: {:.4}%, Time: {:.2}s",
        input_path.display(),
        100.0 * end_size as f32 / start_size as f32,
        compress_elapsed as f32 / 1000.0,
    );

    let compressed_bytes = std::fs::read(&out_path).unwrap();
    let decode_start = Instant::now();

    let decompressed_buf = decompress_fn(&compressed_bytes);

    println!(
        "Decompress in: {:.5}s",
        decode_start.elapsed().as_millis() as f32 / 1000.0
    );
    let out_path = format!(
        "output/roundtrip-{}.lz4",
        input_path.file_stem().unwrap().to_str().unwrap()
    );
    std::fs::write(out_path, decompressed_buf).unwrap();
}

fn compress_lz4_flex(out_path: &Path, mut in_f: std::fs::File) -> std::fs::File {
    let out_f = std::fs::File::create(out_path).unwrap();
    let mut encoder = lz4_flex::frame::FrameEncoder::new(out_f);
    std::io::copy(&mut in_f, &mut encoder).unwrap();
    encoder.finish().unwrap()
}

fn decompress_lz4_flex(compressed_bytes: &[u8]) -> Vec<u8> {
    let mut decoder = lz4_flex::frame::FrameDecoder::new(Cursor::new(compressed_bytes));
    let mut uncompressed_buf = Vec::new();
    std::io::copy(&mut decoder, &mut uncompressed_buf).unwrap();
    uncompressed_buf
}

// fn compress_lz4(out_path: &Path, mut in_f: File) -> File {
//     let out_f = std::fs::File::create(out_path).unwrap();
//     let mut encoder = lz4::EncoderBuilder::new().build(out_f).unwrap();
//     std::io::copy(&mut in_f, &mut encoder).unwrap();
//     let (out_f, maybe_err) = encoder.finish();
//     maybe_err.unwrap();
//     out_f
// }

// fn decompress_lz4(decompress_bytes: &[u8]) -> Vec<u8> {
//     let mut decoder = lz4::Decoder::new(Cursor::new(decompress_bytes)).unwrap();
//     let mut uncompressed_buf = Vec::new();
//     std::io::copy(&mut decoder, &mut uncompressed_buf).unwrap();
//     uncompressed_buf
// }

The text was updated successfully, but these errors were encountered:

bonsairobo · 2023-11-02T08:47:52Z

I just noticed there is a benchmark in this repo using larger files. Here are the decompression results of that:

Rust 1.72

bench_files/dickens - Decompression 2544.32 Mb/s
bench_files/mozilla - Decompression 2188.20 Mb/s
bench_files/mr - Decompression 1765.24 Mb/s
bench_files/nci - Decompression 2934.73 Mb/s
bench_files/ooffice - Decompression 2597.32 Mb/s
bench_files/osdb - Decompression 1508.96 Mb/s
bench_files/reymont - Decompression 2460.89 Mb/s
bench_files/samba - Decompression 2533.86 Mb/s
bench_files/webster - Decompression 2064.81 Mb/s

Rust 1.73

bench_files/dickens - Decompression 1414.35 Mb/s
bench_files/mozilla - Decompression 663.37 Mb/s
bench_files/mr - Decompression 1496.99 Mb/s
bench_files/nci - Decompression 652.04 Mb/s
bench_files/ooffice - Decompression 1887.27 Mb/s
bench_files/osdb - Decompression 1401.62 Mb/s
bench_files/reymont - Decompression 1905.72 Mb/s
bench_files/samba - Decompression 1061.93 Mb/s
bench_files/sao - Decompression 1789.27 Mb/s
bench_files/webster - Decompression 604.82 Mb/s
bench_files/x-ray - Decompression 1446.80 Mb/s
bench_files/xml - Decompression 2472.21 Mb/s

PSeitz · 2023-11-03T11:31:05Z

I don't see that much change on the benchmark. Some variation is normal, as the tests are quite flaky.

cargo +1.72 bench FrameDecompress/lz4_flex
...
cargo bench FrameDecompress/lz4_flex

FrameDecompress/lz4_flex_rust_indep/725
                        time:   [894.00 ns 911.89 ns 933.52 ns]
                        thrpt:  [740.65 MiB/s 758.22 MiB/s 773.39 MiB/s]
                 change:
                        time:   [+10.272% +12.321% +14.746%] (p = 0.00 < 0.05)
                        thrpt:  [-12.851% -10.969% -9.3148%]
FrameDecompress/lz4_flex_rust_linked/725
                        time:   [903.73 ns 919.16 ns 938.63 ns]
                        thrpt:  [736.62 MiB/s 752.22 MiB/s 765.07 MiB/s]
                 change:
                        time:   [+4.8091% +7.0499% +9.3629%] (p = 0.00 < 0.05)
                        thrpt:  [-8.5613% -6.5856% -4.5884%]
FrameDecompress/lz4_flex_rust_indep/9991663
                        time:   [5.6644 ms 5.6929 ms 5.7234 ms]
                        thrpt:  [1.6259 GiB/s 1.6346 GiB/s 1.6428 GiB/s]
                 change:
                        time:   [-1.2619% -0.2034% +0.8968%] (p = 0.72 > 0.05)
                        thrpt:  [-0.8889% +0.2038% +1.2780%]
FrameDecompress/lz4_flex_rust_linked/9991663
                        time:   [4.2277 ms 4.2424 ms 4.2600 ms]
                        thrpt:  [2.1844 GiB/s 2.1934 GiB/s 2.2011 GiB/s]
                 change:
                        time:   [-11.640% -10.925% -10.235%] (p = 0.00 < 0.05)
                        thrpt:  [+11.401% +12.264% +13.173%]
FrameDecompress/lz4_flex_rust_indep/7023566
                        time:   [1.7817 ms 1.8419 ms 1.9063 ms]
                        thrpt:  [3.4313 GiB/s 3.5513 GiB/s 3.6714 GiB/s]
                 change:
                        time:   [-6.1937% -3.5502% -0.5663%] (p = 0.02 < 0.05)
                        thrpt:  [+0.5695% +3.6808% +6.6027%]
FrameDecompress/lz4_flex_rust_linked/7023566
                        time:   [1.8447 ms 1.8756 ms 1.9146 ms]
                        thrpt:  [3.4164 GiB/s 3.4875 GiB/s 3.5460 GiB/s]
                 change:
                        time:   [-28.078% -25.791% -23.129%] (p = 0.00 < 0.05)
                        thrpt:  [+30.088% +34.754% +39.039%]
FrameDecompress/lz4_flex_rust_indep/6627202
                        time:   [3.9133 ms 3.9359 ms 3.9592 ms]
                        thrpt:  [1.5589 GiB/s 1.5682 GiB/s 1.5772 GiB/s]
                 change:
                        time:   [-0.2018% +0.6744% +1.5541%] (p = 0.13 > 0.05)
                        thrpt:  [-1.5303% -0.6699% +0.2022%]
FrameDecompress/lz4_flex_rust_linked/6627202
                        time:   [3.7983 ms 3.8800 ms 3.9678 ms]
                        thrpt:  [1.5556 GiB/s 1.5908 GiB/s 1.6249 GiB/s]
                 change:
                        time:   [+13.916% +16.699% +19.543%] (p = 0.00 < 0.05)
                        thrpt:  [-16.348% -14.310% -12.216%]
FrameDecompress/lz4_flex_rust_indep/5322575
                        time:   [2.2280 ms 2.2641 ms 2.3020 ms]
                        thrpt:  [2.1534 GiB/s 2.1894 GiB/s 2.2249 GiB/s]
                 change:
                        time:   [+4.0524% +6.5407% +9.2544%] (p = 0.00 < 0.05)
                        thrpt:  [-8.4705% -6.1392% -3.8945%]
FrameDecompress/lz4_flex_rust_linked/5322575
                        time:   [2.0397 ms 2.0732 ms 2.1078 ms]
                        thrpt:  [2.3517 GiB/s 2.3910 GiB/s 2.4303 GiB/s]
                 change:
                        time:   [-12.918% -10.255% -7.4586%] (p = 0.00 < 0.05)
                        thrpt:  [+8.0597% +11.427% +14.834%]

MichaelBradetLegris · 2023-11-03T12:45:05Z

I've had the same issue. It's for larger files on 1.73 that there are issues. For smaller files, the time difference isn't much. Looks like the default benchmark files are all <10MB. I noticed you also used rust 1.72 for that test above @PSeitz

PSeitz · 2023-11-03T13:32:08Z

Indeed, that is quite unexpected, as lz4_flex operates on max 4Mb chunks.

The issue seems to be the new ensure_init method in BorrowedCursor, which takes most of the time.
The method was introduced in this PR: rust-lang/rust#97015 and judging from the the graph in the first post it seems to be exponential in cost.

Now

Before

PSeitz · 2023-11-03T14:51:33Z

I created a ticket in the rust repo. This seems to be a major performance issue, so it's surprising this wasn't reported before (or i didn't find it). Thanks for the bug report @bonsairobo

bonsairobo · 2023-11-08T01:10:01Z

Looks like a fix was merged: rust-lang/rust#117576

Although it also says "milestone 1.75" which is surprising. Wouldn't they want to make a minor release of 1.73 to fix this huge regression that affects multiple crates?

bonsairobo · 2023-11-17T06:00:34Z

Apparently 1.74 got the fix. I can try taking a measurement to confirm.

bonsairobo · 2023-11-17T06:15:23Z

Seems to be fixed in 1.74!

PSeitz · 2023-11-17T06:28:37Z

Thanks for confirming!

PSeitz mentioned this issue Nov 3, 2023

Exponential Performance Regression in Rust 1.73 Read API from BorrowedCursor::ensure_init rust-lang/rust#117545

Closed

PSeitz closed this as completed Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad frame decompression performance regression for large data sets in Rust 1.73 #147

Bad frame decompression performance regression for large data sets in Rust 1.73 #147

bonsairobo commented Nov 2, 2023 •

edited

Loading

bonsairobo commented Nov 2, 2023 •

edited

Loading

PSeitz commented Nov 3, 2023

MichaelBradetLegris commented Nov 3, 2023

PSeitz commented Nov 3, 2023

PSeitz commented Nov 3, 2023

bonsairobo commented Nov 8, 2023

bonsairobo commented Nov 17, 2023

bonsairobo commented Nov 17, 2023

PSeitz commented Nov 17, 2023

Bad frame decompression performance regression for large data sets in Rust 1.73 #147

Bad frame decompression performance regression for large data sets in Rust 1.73 #147

Comments

bonsairobo commented Nov 2, 2023 • edited Loading

bonsairobo commented Nov 2, 2023 • edited Loading

PSeitz commented Nov 3, 2023

MichaelBradetLegris commented Nov 3, 2023

PSeitz commented Nov 3, 2023

PSeitz commented Nov 3, 2023

bonsairobo commented Nov 8, 2023

bonsairobo commented Nov 17, 2023

bonsairobo commented Nov 17, 2023

PSeitz commented Nov 17, 2023

bonsairobo commented Nov 2, 2023 •

edited

Loading

bonsairobo commented Nov 2, 2023 •

edited

Loading