You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using rowStart and rowEnd to filter rows which works as advertised, but I am seeing some performance problems. It looks like the library is assembling all of the data from relevant row groups and then slicing off the undesired portion after the fact. If I just want a single row but my row group size is relatively high (i.e. 1 GB), the heap size still gets very large. There doesn't seem much benefit to using rowStart or rowEnd.
Looking through the code , it seems like the library could avoid holding onto the rows that fall outside of the requested row window. Does this problem resonate at all? I wonder if there are any plans to make this more efficient. I might be able to get some bandwidth to help with a fix if it seems doable/useful.
The text was updated successfully, but these errors were encountered:
This is absolutely something that I would like to see improved! There is already a rowLimit parameter to the readColumn function which helps to stop parsing early if not all the rows are needed. But agree that it could be improved.
One thing to be careful of is that raw column data may have a different length than the actual row start and end, because it gets assembled into lists and structs. That being said, I'm pretty sure that clever tricks could save significantly on heap size.
Contributions are most welcome! Happy to further discuss strategies here too.
I am using
rowStart
androwEnd
to filter rows which works as advertised, but I am seeing some performance problems. It looks like the library is assembling all of the data from relevant row groups and then slicing off the undesired portion after the fact. If I just want a single row but my row group size is relatively high (i.e. 1 GB), the heap size still gets very large. There doesn't seem much benefit to usingrowStart
orrowEnd
.Looking through the code , it seems like the library could avoid holding onto the rows that fall outside of the requested row window. Does this problem resonate at all? I wonder if there are any plans to make this more efficient. I might be able to get some bandwidth to help with a fix if it seems doable/useful.
The text was updated successfully, but these errors were encountered: