-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunker Memory leak #647
Comments
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
Finally, remember to use https://discuss.ipfs.io if you just need general support. |
The pool tries to allocate buffers sized to the power of 2 closest to the chunk size. A small file causes a new allocation, and the pool buffer is returned to the pool. small := make([]byte, n)
copy(small, full)
pool.Put(full) Returning the buffer to the pool does not necessarily free it and may keep it in the pool. A large number of files will cause these allocations for each when reading the last partial chunk. So I not think it has to do with small files, but the number of files. It may be better to keep the pool-allocated buffer, and avoid the extra allocation and copy: small := full[:n] or use the pool to allocate the smaller buffer: small := pool.Get(n)
copy(small, full)
pool.Put(full) Having fewer different buffer sizes might help with GC. Will test to see if this makes any difference. |
accidental close - repoened |
I have reimplemented a Splitter and modified some kubo related code. Under scenarios with a large number of files, memory usage has stabilized and no abnormal growth has been observed. chunk func NewAutoSplitter(r io.Reader, size int64, fileSize int64) chunk.Splitter {
ss := &AutoSplitter{
r: r,
size: size,
usePool: true,
fileSize: fileSize,
}
if fileSize >= size {
ss.mode = normal
}
if fileSize < fileSize {
ss.mode = small
ss.buffer = make([]byte, fileSize)
}
return ss
} kubo, core/coreapi/unixfs.go // Constructs a node from reader's data, and adds it. Doesn't pin.
func (adder *Adder) add(reader io.Reader, fileSize int64) (ipld.Node, error) {
chnk := chunker.NewAutoSplitterV2(reader, chunkSize, fileSize)
params := ihelper.DagBuilderParams{
Dagserv: adder.bufferedDS,
RawLeaves: adder.RawLeaves,
Maxlinks: ihelper.DefaultLinksPerBlock,
NoCopy: adder.NoCopy,
CidBuilder: adder.CidBuilder,
}
db, err := params.New(chnk)
if err != nil {
return nil, err
}
var nd ipld.Node
if adder.Trickle {
nd, err = trickle.Layout(db)
} else {
nd, err = balanced.Layout(db)
}
if err != nil {
return nil, err
}
return nd, adder.bufferedDS.Commit()
} |
@Xib1uvXi I would be very interested to see what this looks like for you after running a while: #649 Otherwise, avoiding using the pool in any case may end up being more efficient. I think it may be the case for many files, whether large or small since the extra allocation is caused by a partially filled chunk at the end of the file. That PR includes a benchmark to compare a chunker that uses go-beffer-pool pool against one that does not. The benchmark simulates your use case of 10000 files with sizes varying between 20K and 60K bytes. |
When I add a folder with many small files to kubo, memory usage always spikes abnormally
I located the exception by analyzing the heap
Q1: Will using a pool result in abnormal memory consumption when the file size is much smaller than the chunk size?
The text was updated successfully, but these errors were encountered: