Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic compression #223

Open
BenMcLean opened this issue Nov 8, 2024 · 8 comments
Open

Deterministic compression #223

BenMcLean opened this issue Nov 8, 2024 · 8 comments

Comments

@BenMcLean
Copy link

BenMcLean commented Nov 8, 2024

See this article here: https://dramsch.net/today-i-learned/gzip/today-i-learned-about-deterministic-gzip-compression/

I'd like to do the same thing but it appears EasyCompressor doesn't expose the necessary options to make GZip deterministic.

After looking into this a bit more, it appears that none of the EasyCompressor formats expose the necessary options to make them deterministic. We should be able to set the timestamp to 0 like in that article and get the exact same compressed output for the same decompressed input. If there's randomness, we should be able to control the seed.

@BenMcLean BenMcLean changed the title How to make GZip deterministic? Deterministic compression Nov 8, 2024
@mjebrahimi
Copy link
Owner

It's irrelevant to this library.
The parameter -n of gzip command is to not include the timestamp of the original file.
And it's all about compressing files.
But this library is to compress/decompress data (such as byte[] or stream) not files.
And those data do not have any timestamps.

@BenMcLean
Copy link
Author

It's irrelevant to this library. The parameter -n of gzip command is to not include the timestamp of the original file. And it's all about compressing files. But this library is to compress/decompress data (such as byte[] or stream) not files. And those data do not have any timestamps.

OK well, I found in practice that EasyCompressor output is non-deterministic. Can any of it be made deterministic?

@mjebrahimi
Copy link
Owner

Actually, it is deterministic out of the box.
Since this library works with data (not files) and there isn't a timestamp here to include, so it's always deterministic.
For example, if you compress the same (un-changed) data many times, the compressed outputs (and their hashes) will be the same.

@BenMcLean
Copy link
Author

Actually, it is deterministic out of the box. Since this library works with data (not files) and there isn't a timestamp here to include, so it's always deterministic. For example, if you compress the same (un-changed) data many times, the compressed outputs (and their hashes) will be the same.

Oh, I think I know what happened.

I ran one test on Blazor WASM and another on Windows and got different results.

Maybe it's something to do with which platform.

@mjebrahimi
Copy link
Owner

mjebrahimi commented Nov 13, 2024

I reproduced your example with different compressors and I found a there is a weird difference in GZip compressed output between server-side .NET and client-side (Blazor WASM).

Brotli is not supported on Blazor WASM and the others (Deflate, LZ4, LZMA, Zstd, and Snappy) algorithms work fine (the same) between server and browser.

GZip compressed output (and thereby its hash) is different between server and client.
However, the uncompressed data is equal to the original data before compression.

I should investigate more on it to find if it's a mistake implementation in this library or if it's a BUG for .NET runtime.

The Repo:
https://github.com/mjebrahimi/BlazorWebAssembly-GZip-Difference

Screenshot

@BenMcLean
Copy link
Author

Yeah sorry I didn't realize I'd actually done the two tests on different runtimes when I made the initial post. Not a big deal: it just explains why my unit test failed. Thanks. :)

@mjebrahimi
Copy link
Owner

You're welcome.
Anyway, it's an interesting problem you found and I will inform you with an update in a few days after more investigation on it.

@BenMcLean
Copy link
Author

It seems to also affect System.IO.Comprrssion on .NET Standard 2.0 as well so apparently it isn't specific to EasyCompressor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants