-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't write parquet of mixed types #159
Comments
I think more specifically... some parquet encodings cannot be used for both strings and numbers. In the test case, read-write-read works: parquet-wasm/tests/js/arrow2.ts Lines 41 to 49 in 50ff7b0
parquet-wasm/tests/data/generate_data.py Lines 10 to 13 in 50ff7b0
Plain encoding:
But it makes sense that dictionary encodings won't work on some data types. On the parquet-wasm/src/arrow1/writer_properties.rs Line 187 in 50ff7b0
On the
|
Oh interesting! It looks like Arrow is inferring dictionaries for strings in I think I can work around this! Although of course it would be great to have support for arrow dictionaries turning into one of the parquet dictionary encodings by default |
Ah interesting. My test case isn't testing writing a table initially created in arrow.js, because it loaded the table saved from pyarrow. Regardless, as I mentioned above, I think it would be good to support column-specific encodings. They're already supported in |
Agreed! FWIW, working around this by inferring the type |
(Thank you so much for this project! I was about to write the very same thing and was fortunate enough to stumble upon your version)
The following code fails in version v0.3.1
In particular, it throws the following error:
External format error: Invalid argument error: The datatype Float64 cannot be encoded by PlainDictionary
.If I comment out the
encoding line
above, I instead get the following error:External format error: Not yet implemented: Dictionary arrays only support dictionary encoding
I don't think any single parquet encoding works for both strings and numbers—instead, encodings need to be settable or inferred per field, which requires an API change of some sort to
WriterPropertiesBuilder
.The text was updated successfully, but these errors were encountered: