Elevated CPI rate in flate/stateless.go #717
-
Hi! You've built an awesome library which helped to shave 20% of CPU consumption in Parca Agent parca-dev/parca-agent#1065. Out of curiosity I profiled the agent with Intel Vtune and it showed CPI rate 124.4 on
I am not sure how to interpret the "store misses", it's my first time using Vtune :). type tokens struct {
extraHist [32]uint16 // codes 256->maxnumlit
offHist [32]uint16 // offset codes
litHist [256]uint16 // codes 0->255
nFilled int
n uint16 // Must be able to contain maxStoreBlockSize
tokens [maxStoreBlockSize + 1]token
} |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Yes, As the docs state using Stateless compression is only for a very specific use cases, and it doesn't seem your use case covers that. Stateless does not mean no allocations - it means no buffers are kept between writes, so it will allocate more totally, since it has to allocate for every Write. You are sacrificing both speed and compression by using a stateless compressor. Likely you will see a 2-3x speed up by using a I'd recommend the default compression you will not lose as much compression as you currently do. It is still likely to be 2x faster than the old. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the explanation! Stateless compression helped to avoid excessive heap allocs which reduced GC pressure.
Initially I thought that it's possible to reuse a le.PutUint32(z.buf[:4], z.digest)
le.PutUint32(z.buf[4:8], z.size)
_, z.err = z.w.Write(z.buf[:8]) Do you think it's feasible to decouple footer writing from |
Beta Was this translation helpful? Give feedback.
Yes,
var dst tokens
is zeroing.As the docs state using Stateless compression is only for a very specific use cases, and it doesn't seem your use case covers that. Stateless does not mean no allocations - it means no buffers are kept between writes, so it will allocate more totally, since it has to allocate for every Write.
You are sacrificing both speed and compression by using a stateless compressor. Likely you will see a 2-3x speed up by using a
sync.Pool
with a gzip encoder at level 1.I'd recommend the default compression you will not lose as much compression as you currently do. It is still likely to be 2x faster than the old.