Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parsing Exception for FIXED_LEN_BYTE_ARRAY Data in Parquet File #1107

Open
CrazyBug-11 opened this issue Nov 19, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@CrazyBug-11
Copy link

In my dataset, there is a shape_area field defined as follows:
image

During parsing, I found that the data values become excessively large. For example:
The original value 173.24927660400 is parsed as 1.73249276604E24.

After investigating the code, I found an issue in the ParquetPrimitiveConverter class on line 102, where the scale is negated:

int scale = -decimal.getScale();
When I modified the code to use int scale = decimal.getScale();, the parsed data values were correct.

I would like to understand if there is any specific reason for negating the scale (-decimal.getScale())? Does it serve any special purpose, or is it a mistake?

@CrazyBug-11 CrazyBug-11 added the bug Something isn't working label Nov 19, 2024
@msbarry
Copy link
Contributor

msbarry commented Nov 23, 2024

int scale = -decimal.getScale();

Comes from this section of the spec:

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

It might be reversed though, would be good to confirm how that field gets interpreted by another tool to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants