Skip to content

Commit

Permalink
Remove row group filter for map and list (facebookincubator#10510)
Browse files Browse the repository at this point in the history
Summary:
There is a fast path for filter all null columns in `ScanSpec` based on `statistics` by assuming all null happens when `NumberOfValues` equals `0`. But when do `filterRowGroups` against one line file with null `Map`, this rule will apply to the `Key` column reader, which has `testNull()` as `false` and finally decide this rowgroup should be skipped. It means this rule doesn't apply for the columns like `Map::Key` . But limited to the info that statistics can provide, we don't have a better way to refine this fast path, so we remove the `filterRowGroups` in `MapColumnReader` in this PR.

Pull Request resolved: facebookincubator#10510

Reviewed By: Yuhta

Differential Revision: D60835181

Pulled By: kagamiori

fbshipit-source-id: 9284ec94800ebe038a6ec1bbc99be0b994fd38cd
  • Loading branch information
yma11 authored and facebook-github-bot committed Aug 8, 2024
1 parent 79129df commit e7a6ffc
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 3 deletions.
5 changes: 2 additions & 3 deletions velox/dwio/parquet/reader/RepeatedColumnReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,8 +215,7 @@ void MapColumnReader::filterRowGroups(
uint64_t rowGroupSize,
const dwio::common::StatsContext& context,
dwio::common::FormatData::FilterRowGroupsResult& result) const {
keyReader_->filterRowGroups(rowGroupSize, context, result);
elementReader_->filterRowGroups(rowGroupSize, context, result);
// empty placeholder to avoid incorrect calling on parent's impl
}

ListColumnReader::ListColumnReader(
Expand Down Expand Up @@ -318,7 +317,7 @@ void ListColumnReader::filterRowGroups(
uint64_t rowGroupSize,
const dwio::common::StatsContext& context,
dwio::common::FormatData::FilterRowGroupsResult& result) const {
child_->filterRowGroups(rowGroupSize, context, result);
// empty placeholder to avoid incorrect calling on parent's impl
}

} // namespace facebook::velox::parquet
Binary file added velox/dwio/parquet/tests/examples/null_map.parquet
Binary file not shown.
13 changes: 13 additions & 0 deletions velox/dwio/parquet/tests/reader/ParquetTableScanTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,19 @@ TEST_F(ParquetTableScanTest, map) {
assertSelectWithFilter({"map"}, {}, "", "SELECT map FROM tmp");
}

TEST_F(ParquetTableScanTest, nullMap) {
auto path = getExampleFilePath("null_map.parquet");
loadData(
path,
ROW({"i", "c"}, {VARCHAR(), MAP(VARCHAR(), VARCHAR())}),
makeRowVector(
{"i", "c"},
{makeConstant<std::string>("1", 1),
makeNullableMapVector<std::string, std::string>({std::nullopt})}));

assertSelectWithFilter({"i", "c"}, {}, "", "SELECT i, c FROM tmp");
}

// Core dump is fixed.
TEST_F(ParquetTableScanTest, singleRowStruct) {
auto vector = makeArrayVector<int32_t>({{}});
Expand Down

0 comments on commit e7a6ffc

Please sign in to comment.