Skip to content

Commit

Permalink
Replace check with truncate
Browse files Browse the repository at this point in the history
In the hotloop, the check if the offset is inside bounds can be change
in a unconditional truncate.
We don't return on corrupted data in that case, but that's fine since this is a
decompressor and not a data checker.

Improves unsafe decompression performance ~4% and safe decompression by
~2%

```bash
BlockDecompress/lz4_flex_rust/725
                        time:   [228.55 ns 229.81 ns 231.35 ns]
                        thrpt:  [2.9186 GiB/s 2.9381 GiB/s 2.9543 GiB/s]
                 change:
                        time:   [-2.6443% -2.1496% -1.5867%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6123% +2.1968% +2.7161%]
                        Performance has improved.
BlockDecompress/lz4_flex_rust/34308
                        time:   [17.469 µs 17.484 µs 17.498 µs]
                        thrpt:  [1.8260 GiB/s 1.8275 GiB/s 1.8290 GiB/s]
                 change:
                        time:   [-5.2028% -5.1189% -5.0341%] (p = 0.00 < 0.05)
                        thrpt:  [+5.3010% +5.3951% +5.4883%]
                        Performance has improved.
BlockDecompress/lz4_flex_rust/64723
                        time:   [31.758 µs 31.873 µs 32.033 µs]
                        thrpt:  [1.8817 GiB/s 1.8912 GiB/s 1.8981 GiB/s]
                 change:
                        time:   [-4.9070% -4.7238% -4.4927%] (p = 0.00 < 0.05)
                        thrpt:  [+4.7040% +4.9580% +5.1602%]
                        Performance has improved.
BlockDecompress/lz4_flex_rust/66675
                        time:   [12.819 µs 12.875 µs 12.967 µs]
                        thrpt:  [4.7888 GiB/s 4.8230 GiB/s 4.8440 GiB/s]
                 change:
                        time:   [-0.4198% -0.1229% +0.3894%] (p = 0.67 > 0.05)
                        thrpt:  [-0.3879% +0.1231% +0.4216%]
                        No change in performance detected.
BlockDecompress/lz4_flex_rust/9991663
                        time:   [4.1495 ms 4.1566 ms 4.1648 ms]
                        thrpt:  [2.2343 GiB/s 2.2387 GiB/s 2.2425 GiB/s]
                 change:
                        time:   [-4.4175% -4.2608% -4.0802%] (p = 0.00 < 0.05)
                        thrpt:  [+4.2538% +4.4505% +4.6216%]
                        Performance has improved.
```
  • Loading branch information
PSeitz committed Jun 29, 2023
1 parent 244275a commit 23b05b0
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 10 deletions.
7 changes: 1 addition & 6 deletions src/block/decompress.rs
Original file line number Diff line number Diff line change
Expand Up @@ -283,12 +283,7 @@ pub(crate) fn decompress_internal<const USE_DICT: bool, S: Sink>(
let offset = read_u16_ptr(&mut input_ptr) as usize;

let output_len = unsafe { output_ptr.offset_from(output_base) as usize };
#[cfg(not(feature = "unchecked-decode"))]
{
if offset > output_len + ext_dict.len() {
return Err(DecompressError::OffsetOutOfBounds);
}
}
let offset = offset.min(output_len + ext_dict.len());

// Check if part of the match is in the external dict
if USE_DICT && offset > output_len {
Expand Down
5 changes: 1 addition & 4 deletions src/block/decompress_safe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,10 +156,7 @@ pub(crate) fn decompress_internal<const USE_DICT: bool, S: Sink>(
// In this branch we know that match_length is at most 18 (14 + MINMATCH).
// But the blocks can overlap, so make sure they are at least 18 bytes apart
// to enable an optimized copy of 18 bytes.
let (start, did_overflow) = output.pos().overflowing_sub(offset);
if did_overflow {
return Err(DecompressError::OffsetOutOfBounds);
}
let start = output.pos().saturating_sub(offset);
if offset >= match_length {
output.extend_from_within(start, 18, match_length);
} else {
Expand Down

0 comments on commit 23b05b0

Please sign in to comment.