-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gc
garbage collector support for StringViewArray
and BinaryViewArray
#5513
Comments
BTW @RinChanNOWWW has implemented I think the same basic pattern can be applied to implement |
Another potential idea came up in discord today which was to also implement some way of "interning" strings (aka track which strings have been seen before and remove duplicates) https://discord.com/channels/885562378132000778/885562378132000781/1238127788788285521 The arrow dictionary array builder does this already so we could borrow its implementation |
(sorry for jumping into this from nowhere)
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement
StringViewArray
-- see #5374In #5481 we added support for
StringViewArray
andByteViewArray
.This ticket tracks adding a
gc
method toStringViewArray
andByteViewArray
After calling
filter
ortake
on aStringViewArray
orByteViewArray
the backing variable length buffer may be much larger than necessary to store the resultsSo before an array may look like the following with significant "garbage" space
After GC it should look like
Describe the solution you'd like
I would like to add a method called
StringViewArray::gc
(andByteViewArray::gc
) that will compactI expect users of the arrow crates to invoke this function, not any of the arrow kernels themselves
Describe alternatives you've considered
We could also add the
gc
functionality as its own standalone kernel (e.g.kernels::gc
) rather than a method on the array.Additional context
This GC is what is described in https://pola.rs/posts/polars-string-type/
The text was updated successfully, but these errors were encountered: