-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad frame decompression performance regression for large data sets in Rust 1.73 #147
Comments
I just noticed there is a benchmark in this repo using larger files. Here are the decompression results of that: Rust 1.72
Rust 1.73
|
I don't see that much change on the benchmark. Some variation is normal, as the tests are quite flaky. cargo +1.72 bench FrameDecompress/lz4_flex
...
cargo bench FrameDecompress/lz4_flex
FrameDecompress/lz4_flex_rust_indep/725
time: [894.00 ns 911.89 ns 933.52 ns]
thrpt: [740.65 MiB/s 758.22 MiB/s 773.39 MiB/s]
change:
time: [+10.272% +12.321% +14.746%] (p = 0.00 < 0.05)
thrpt: [-12.851% -10.969% -9.3148%]
FrameDecompress/lz4_flex_rust_linked/725
time: [903.73 ns 919.16 ns 938.63 ns]
thrpt: [736.62 MiB/s 752.22 MiB/s 765.07 MiB/s]
change:
time: [+4.8091% +7.0499% +9.3629%] (p = 0.00 < 0.05)
thrpt: [-8.5613% -6.5856% -4.5884%]
FrameDecompress/lz4_flex_rust_indep/9991663
time: [5.6644 ms 5.6929 ms 5.7234 ms]
thrpt: [1.6259 GiB/s 1.6346 GiB/s 1.6428 GiB/s]
change:
time: [-1.2619% -0.2034% +0.8968%] (p = 0.72 > 0.05)
thrpt: [-0.8889% +0.2038% +1.2780%]
FrameDecompress/lz4_flex_rust_linked/9991663
time: [4.2277 ms 4.2424 ms 4.2600 ms]
thrpt: [2.1844 GiB/s 2.1934 GiB/s 2.2011 GiB/s]
change:
time: [-11.640% -10.925% -10.235%] (p = 0.00 < 0.05)
thrpt: [+11.401% +12.264% +13.173%]
FrameDecompress/lz4_flex_rust_indep/7023566
time: [1.7817 ms 1.8419 ms 1.9063 ms]
thrpt: [3.4313 GiB/s 3.5513 GiB/s 3.6714 GiB/s]
change:
time: [-6.1937% -3.5502% -0.5663%] (p = 0.02 < 0.05)
thrpt: [+0.5695% +3.6808% +6.6027%]
FrameDecompress/lz4_flex_rust_linked/7023566
time: [1.8447 ms 1.8756 ms 1.9146 ms]
thrpt: [3.4164 GiB/s 3.4875 GiB/s 3.5460 GiB/s]
change:
time: [-28.078% -25.791% -23.129%] (p = 0.00 < 0.05)
thrpt: [+30.088% +34.754% +39.039%]
FrameDecompress/lz4_flex_rust_indep/6627202
time: [3.9133 ms 3.9359 ms 3.9592 ms]
thrpt: [1.5589 GiB/s 1.5682 GiB/s 1.5772 GiB/s]
change:
time: [-0.2018% +0.6744% +1.5541%] (p = 0.13 > 0.05)
thrpt: [-1.5303% -0.6699% +0.2022%]
FrameDecompress/lz4_flex_rust_linked/6627202
time: [3.7983 ms 3.8800 ms 3.9678 ms]
thrpt: [1.5556 GiB/s 1.5908 GiB/s 1.6249 GiB/s]
change:
time: [+13.916% +16.699% +19.543%] (p = 0.00 < 0.05)
thrpt: [-16.348% -14.310% -12.216%]
FrameDecompress/lz4_flex_rust_indep/5322575
time: [2.2280 ms 2.2641 ms 2.3020 ms]
thrpt: [2.1534 GiB/s 2.1894 GiB/s 2.2249 GiB/s]
change:
time: [+4.0524% +6.5407% +9.2544%] (p = 0.00 < 0.05)
thrpt: [-8.4705% -6.1392% -3.8945%]
FrameDecompress/lz4_flex_rust_linked/5322575
time: [2.0397 ms 2.0732 ms 2.1078 ms]
thrpt: [2.3517 GiB/s 2.3910 GiB/s 2.4303 GiB/s]
change:
time: [-12.918% -10.255% -7.4586%] (p = 0.00 < 0.05)
thrpt: [+8.0597% +11.427% +14.834%] |
I've had the same issue. It's for larger files on 1.73 that there are issues. For smaller files, the time difference isn't much. Looks like the default benchmark files are all <10MB. I noticed you also used rust 1.72 for that test above @PSeitz |
Indeed, that is quite unexpected, as lz4_flex operates on max 4Mb chunks. The issue seems to be the new |
I created a ticket in the rust repo. This seems to be a major performance issue, so it's surprising this wasn't reported before (or i didn't find it). Thanks for the bug report @bonsairobo |
Looks like a fix was merged: rust-lang/rust#117576 Although it also says "milestone 1.75" which is surprising. Wouldn't they want to make a minor release of 1.73 to fix this huge regression that affects multiple crates? |
Apparently 1.74 got the fix. I can try taking a measurement to confirm. |
Seems to be fixed in 1.74! |
Thanks for confirming! |
Both measurements were taken for
lz4_flex
version 0.11.1. Only bumping the Rust compiler version caused the regression.What I find very interesting is that essentially the same regression has occurred for the
lz4
crate, and it seems both regressions have only occurred for decompression of the frame format, not the block format.I think the next step is to investigate which specific scope of code in the frame decompression has gotten slower.
Benchmark
The text was updated successfully, but these errors were encountered: