lz4hc compression support #21

roblabla · 2021-08-09T16:08:09Z

Hello, currently lz4_flex currently only supports the "standard" LZ4 compression, which means it can only compress at level 0, 1 and 2 (in liblz4, those levels are all the same). To support higher compression levels, LZ4 also has a mode called High Compression (HC), but it is unfortunately unimplemented in lz4_flex.

I was wondering if lz4_flex has any plans to implement HC mode in its compressor?

The text was updated successfully, but these errors were encountered:

PSeitz · 2021-08-12T09:18:51Z

Hi,

generally yes, I would really like to have it in lz4_flex. I did some experiments some time ago, but they are outdated now.

I think conceptually it should be quite simple, instead of having one entry in the hashmap to search for duplicates back in the stream we would have multiple buckets for one hash and they would all get checked to find the best one.
There may be other strategies, but I think this is the main one, for better compression in lz4.

Multiple Positions per Hash in the Hashmap

e.g. imagine this byte sequence

[1, 2, 3, 4, 98, 1, 2, 3, 4, 97, 1, 2, 3, 4, 98]

While walking over the data, we put hash of every 4 bytes and its position into the hashmap.

1, 2, 3, 4 -> (hash 999, position 0)
2, 3, 4, 96 -> (hash 555, position 1)

...
when we arrive a the second entry, currently it would overwrite the old entry
1, 2, 3, 4 -> (hash 999, position 5)

If we would keep multiple entries for the hash, the last sequence 1, 2, 3, 4, 98, could compare both entries and see that the first is a longer match.

Currently the hashmap is simply a Vec, and the hash is right shifted to fit in the Vec bounds. It should be possible to double (or quadruple) the Vec and then have multiple posititons for one hash.

Awendel mentioned this issue Sep 26, 2022

expose speed / high compression option antoniomuso/lz4-napi#279

Open

PSeitz mentioned this issue Jun 6, 2024

How to specify the compression level? #165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lz4hc compression support #21

lz4hc compression support #21

roblabla commented Aug 9, 2021 •

edited

Loading

PSeitz commented Aug 12, 2021 •

edited

Loading

lz4hc compression support #21

lz4hc compression support #21

Comments

roblabla commented Aug 9, 2021 • edited Loading

PSeitz commented Aug 12, 2021 • edited Loading

Multiple Positions per Hash in the Hashmap

roblabla commented Aug 9, 2021 •

edited

Loading

PSeitz commented Aug 12, 2021 •

edited

Loading