-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix reading MAP_KEY_VALUE Parquet SchemaElement
In Parquet, the map type is annotated as MAP converted type. But For backward-compatibility, a group annotated with MAP_KEY_VALUE that is not contained by a MAP-annotated group should be handled as a MAP-annotated group. The previous code had a couple of mistakes: 1. If a group is MAP_KEY_VALUE, its parent type needs to be checked to see if it's a MAP group. But the code didn't check the parent. This commit adds a parentSchemaIdx to Parquet reader's getParquetColumnInfo() function to pass the parent schema. 2. The previous code mistakenly treated MAP_KEY_VALUE as a child of MAP-annotated group, and thought it should be repeated and have 2 children. The fact is it can be optional instead of repeated, and has only 1 child. This commit moves the handling of this type to be before LIST and MAP types and takes the type of its only child, which is a repeated group called "key_value". For more information please check https://github.com/apache/parquet- format/blob/master/LogicalTypes.md#maps
- Loading branch information
Showing
3 changed files
with
81 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters