You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use this crate to write a parquet file, which seems like the road less traveled. None of the examples in this codebase actually generate a row group. I wonder if anyone's doing that, or if I'm the first one trying?
The API definitely seems like it's not been used for writing much, and there's a lot of sharp edges. For example, it looks like the crate wants to enable recycling page buffers - the Compressor reuses buffers from the pages passed to it, and it can theoretically return them again from into_innner, but the data is moved into FileWriter::write and can't be recovered.
Does anyone have examples of successfully using this crate to write a parquet file without doing a lot of allocations?
The text was updated successfully, but these errors were encountered:
Build a Page by appending primitive values of NativeType or str.
Build a Vec of those. (Buffers get moved.)
Compress them using Compressor to get an iterator of CompressedPage = a column chunk. (Page buffers get reused.)
Build a DynIterator over the columns to build a row group
Pass the row group to the FileWriter
THIS IS WHERE THE INEFFICIENCY IS: there is no way to recover the original buffers after passing the row group to the FileWriter. To build the next row group you have to start over.
I'm trying to use this crate to write a parquet file, which seems like the road less traveled. None of the examples in this codebase actually generate a row group. I wonder if anyone's doing that, or if I'm the first one trying?
The API definitely seems like it's not been used for writing much, and there's a lot of sharp edges. For example, it looks like the crate wants to enable recycling page buffers - the Compressor reuses buffers from the pages passed to it, and it can theoretically return them again from into_innner, but the data is moved into FileWriter::write and can't be recovered.
Does anyone have examples of successfully using this crate to write a parquet file without doing a lot of allocations?
The text was updated successfully, but these errors were encountered: