-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix reading MAP_KEY_VALUE Parquet SchemaElement
In Parquet, the map type is annotated as MAP converted type nomally. It should contain a repeated group annotated with MAP_KEY_VALUE, which in turn contains two children key and value: <map-repetition> group <name> (MAP) { repeated group key_value (MAP_KEY_VALUE) { required <key-type> key; <value-repetition> <value-type> value; } } But sometimes a group annotated with MAP_KEY_VALUE was incorrectly used in place of MAP. <map-repetition> group my_map (MAP_KEY_VALUE) { repeated group map { required binary key (UTF8); optional int32 value; } } For backward-compatibility, a MAP_KEY_VALUE that is not contained by MAP should be treated as MAP. This commit makes the following changes: 1. Adds a parentSchemaIdx to Parquet reader's getParquetColumnInfo() function to pass the parent schema. 2. Differenciate the situations where a MAP_KEY_VALUE's parent is or is not a MAP. If it is, then it should be the repeated group that contains the key and value. If it is not, it should be treated the same as MAP. For more information please check https://github.com/apache/parquet- format/blob/master/LogicalTypes.md#maps
- Loading branch information
Showing
3 changed files
with
105 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters