Replies: 14 comments
-
I wouldn't mind, especially since it performs measurably worse the DataDog's - both in terms of compression and speed ;) Before I add it, I would just like to make sure that this is expected. I have:
I think only Is that the cost of adding CRC (in terms of speed?) Using master/ In terms of benchmarking I am not doing much more than Oh yeah, the Datadog version I am using is v1.4.0 / |
Beta Was this translation helpful? Give feedback.
-
The performance difference looks unexpected.
Probably this is related to different API functions used for stream compression:
I'll try avoiding copying to internal buffer in |
Beta Was this translation helpful? Give feedback.
-
I think I found the main issue that explains performance difference and compressed size difference: the readahead implements WriteTo method, which is short-circuited inside I'm unsure about changing the internal buffer size inside
|
Beta Was this translation helpful? Give feedback.
-
Probably |
Beta Was this translation helpful? Give feedback.
-
This is very different from
The "recommended" size should be understood in the context of a C caller. It doesn't consider the cost of crossing dynamic language interfaces, such as |
Beta Was this translation helpful? Give feedback.
-
Yann, thanks for the explanation! I'll add the ability to configure internal buffer size for streaming compression by providing new API in @klauspost , could you run fuzz tests for |
Beta Was this translation helpful? Give feedback.
-
I seem to be getting this error when I try to fuzz it:
Before I start digging - do you know how I fix this? |
Beta Was this translation helpful? Give feedback.
-
It looks like Windows support is broken, since nobody used it since the addition :( @zhuyie , could you look into Windows support for In the mean time try running |
Beta Was this translation helpful? Give feedback.
-
I think it is specific for go-fuzz (notice how it moves the sources). The ordinary build process works fine. It is basically running this script: https://github.com/klauspost/compress/blob/master/zstd/testdata/fuzz-decompress.cmd Here is the fuzzing function: https://github.com/klauspost/compress/blob/master/zstd/fuzz_decompress.go Just replace the import and you should be going. You can add additional fuzz targets here: https://github.com/facebook/zstd/releases/tag/fuzz-corpora goes in here: https://github.com/klauspost/compress/tree/master/zstd/testdata/fuzz-decompress/corpus |
Beta Was this translation helpful? Give feedback.
-
Testing the compressor is kindda pointless, as you might as well do that from your own package (and go fuzz doesn't help much since it is C code and it cannot mark coverage). Comparing decompressors, there will we differences. Some values are up to the implementation and I am not as strict on some other values (which I should fix, but not super important). |
Beta Was this translation helpful? Give feedback.
-
@klauspost , could you re-run benchmarks against gozstd v1.5.1? It contains the updated Additionally, could you provide a guideline on how to run these benchmarks, so next time I could run them myself? |
Beta Was this translation helpful? Give feedback.
-
@valyala With zskp: this package. Level 1= Fastest, Level 2 = Default
I am using this for testing (no warranty) https://gist.github.com/klauspost/d98da4c469ac30c47c3b0405d84e3828 I use it in a script like this:
The first line is to load the file into cache. |
Beta Was this translation helpful? Give feedback.
-
Actually testing out with
This seems to be down to this addition which removes the So it seems that when the input doesn't supply io.WriterTo and io.Copy is used, the performance of the Datadog package takes a big hit. So in most cases this package performs similar to cgo. I will re-run benchmarks on your package and post it below. Edit: It doesn't seem like your package gains the same speedup. Re-adding the interface doesn't provide the any speedup:
Link to test files are here: #134 If you have a test-set (preferably a few GB) either as a single stream or as a collection of files I could test against your specific payload type. You can upload files here: https://goobox.io/#/upload |
Beta Was this translation helpful? Give feedback.
-
Here's another reason to not use cgo by default for builds - the playground won't work, since gcc isn't installed: https://play.golang.org/p/AYmVJDSWqUV
This is unfortunate, since the playground now has proper module support, so it could be used to share example programs with badger. |
Beta Was this translation helpful? Give feedback.
-
DataDog/zstd has poor support for streaming, has potential bugs in streaming and is less optimized than valyala/gozstd. So I'd recommend switching to
valyala/gozstd
in tests and benchmarks.Also
valyala/gozstd
vendors the latest upstreamzstd
release without any modifications.Beta Was this translation helpful? Give feedback.
All reactions