Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mediocre deflate compression compared to Node.js's zlib #77

Open
Fishrock123 opened this issue May 25, 2020 · 5 comments
Open

Mediocre deflate compression compared to Node.js's zlib #77

Fishrock123 opened this issue May 25, 2020 · 5 comments

Comments

@Fishrock123
Copy link

Fishrock123 commented May 25, 2020

From rust-lang/flate2-rs#238

The pure rust implementation of deflate gives a somewhat unsatisfactory compression result, even using max compression level.

Somehow, the zlib binding from here is even worse, which is very strange since my benchmark is from Node.js's zlib, using zlib's default compression level.

Consider the following program - it shouldn't be too much of a stretch to wonder how this library isn't able to produce the same result as COMPRESSED at default, let alone max.

use miniz_oxide::deflate::compress_to_vec_zlib;
use miniz_oxide::deflate::compress_to_vec;

use miniz_oxide::inflate::decompress_to_vec;

const TEXT: &'static str = concat![
    "Chunk one\n",
    "data data\n",
    "\n",
    "Chunk two\n",
    "data data\n",
    "\n",
    "Chunk three\n",
    "data data\n",
];

// Generated in Node.js using it's zlib bindings.
// Zlib default compression level.
//
// Taken from my defunct project:
// https://github.com/Fishrock123/zlib-transform/blob/master/test/test-basic.js
const COMPRESSED: &'static [u8] = &[
    0x73, 0xce, 0x28, 0xcd, 0xcb, 0x56, 0xc8, 0xcf, 0x4b, 0xe5, 0x4a, 0x49, 0x2c, 0x49, 0x54, 0x00, 0x11, 0x5c, 0x5c, 0xce, 0x60, 0xc1, 0x92, 0xf2, 0x7c, 0x2c, 0x82, 0x19, 0x45, 0xa9, 0xc8, 0x6a, 0x01,
];

fn main() {
    let compressed = compress_to_vec(TEXT.as_bytes(), 10);
    println!("miniz_oxide : {:x?}", compressed);

    let compressed = compress_to_vec_zlib(TEXT.as_bytes(), 10);
    println!("zlib        : {:x?}", compressed);

    println!("node.js zlib: {:x?}", COMPRESSED);

    let text = decompress_to_vec(COMPRESSED).unwrap();
    assert_eq!(TEXT.as_bytes(), &text[..]);
    assert_eq!(TEXT, std::str::from_utf8(&text).unwrap());
}

Output

miniz_oxide : [6d, ca, b1, 9, 0, 30, 8, 5, d1, fe, 4f, e1, 2e, 4e, 22, 44, 10, 2, a, c1, 90, f5, 43, 52, 59, d8, 5c, f1, 38, b6, ed, 93, c2, 15, 43, 52, e8, 5, e0, 8f, 79, a2, 41, 5b, 5a, df, b]
zlib        : [78, da, 6d, ca, b1, 9, 0, 30, 8, 5, d1, fe, 4f, e1, 2e, 4e, 22, 44, 10, 2, a, c1, 90, f5, 43, 52, 59, d8, 5c, f1, 38, b6, ed, 93, c2, 15, 43, 52, e8, 5, e0, 8f, 79, a2, 41, 5b, 5a, df, b, ba, 21, 15, 4c]
node.js zlib: [73, ce, 28, cd, cb, 56, c8, cf, 4b, e5, 4a, 49, 2c, 49, 54, 0, 11, 5c, 5c, ce, 60, c1, 92, f2, 7c, 2c, 82, 19, 45, a9, c8, 6a, 1]

(This was noticed when building https://github.com/Fishrock123/tide-compress.)

@oyvindln
Copy link
Collaborator

compress_to_vec_zlib is not a binding to zlib, zlib here refers to the output having a zlib header/footer as noted in the function description.

The compression ratio shouldn't be significantly different to what the zlib library does, so I will look into it.

@Fishrock123
Copy link
Author

zlib here refers to the output having a zlib header/footer as noted in the function description.

Ah, my mistake there. I see that now - [78, da, and ba, 21, 15, 4c]...

@oyvindln
Copy link
Collaborator

oyvindln commented May 27, 2020

It looks like using flate2 with the original C miniz gives the same result, while the zlib (library) back-end for flate2 and deflate-rs gives a similar result to the zlib binding in node.js. (flate2 with zlib backend should in theory be the same since it uses the same library underneath.)

Not all files give the same unexpected bad compression level, but e.g this ends up with more than twice the compressed size with miniz/miniz_oxide than zlib/deflate-rs. Could maybe the block type selection algorithm that needs tweaking (something that can make a large difference on short data if the wrong type is chosen).

@oyvindln
Copy link
Collaborator

oyvindln commented Oct 7, 2020

To expand on this, currently miniz_oxide (and miniz) decides whether to use static or dynamic huffman codes simply based on whether the number of lz-compressed bytes (i.e number of literals and length/distance pairs) the compression of the block resulted in. Static codes are used if the number of lz bytes are less than 48 which may not always be the optimal choice. The optimal choice will depend on how much the overhead of a dynamic block caused by having to include the huffman tables is.

Zlib builds tables no matter the length and then compares the projected length of a static vs dynamic block which gives a more accurate results (though incurring a little processing overhead instead in the case a block ends up being static.) deflate-rs does something similar. In most cases this won't incur any processing overhead since a dynamic block is used anyhow, so I'm not sure if doing it in the way miniz does is really worth it. Maybe it would even be worth having a simplified code path for compressing very short input if there are use cases where there is a lot of very short data being compressed.

Will have to at what the examples results in in code though to see if it's just this, or if there are more things causing issues.

@Fishrock123
Copy link
Author

Maybe it would even be worth having a simplified code path for compressing very short input if there are use cases where there is a lot of very short data being compressed.

A potential case for that: Server Send Events

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants