-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-5953][VL] Prevent pushdown filters with unsupported data types to scan node #5954
Changes from all commits
19668f1
76ba8ab
8006e67
d176862
588c539
c055147
ad927fe
df3eb3f
4c6ebc7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1600,17 +1600,26 @@ bool SubstraitToVeloxPlanConverter::childrenFunctionsOnSameField( | |
bool SubstraitToVeloxPlanConverter::canPushdownFunction( | ||
const ::substrait::Expression_ScalarFunction& scalarFunction, | ||
const std::string& filterName, | ||
uint32_t& fieldIdx) { | ||
// Condtions can be pushed down. | ||
uint32_t& fieldIdx, | ||
const std::vector<TypePtr>& veloxTypeList) { | ||
// Conditions can be pushed down. | ||
static const std::unordered_set<std::string> supportedFunctions = {sIsNotNull, sIsNull, sGte, sGt, sLte, sLt, sEqual}; | ||
|
||
bool canPushdown = false; | ||
if (supportedFunctions.find(filterName) != supportedFunctions.end() && | ||
fieldOrWithLiteral(scalarFunction.arguments(), fieldIdx)) { | ||
// The arg should be field or field with literal. | ||
canPushdown = true; | ||
if (supportedFunctions.find(filterName) == supportedFunctions.end()) { | ||
return false; | ||
} | ||
|
||
// The arg should be field or field with literal. | ||
if (!fieldOrWithLiteral(scalarFunction.arguments(), fieldIdx)) { | ||
return false; | ||
} | ||
return canPushdown; | ||
|
||
// Check whether data type is supported or not | ||
if (!veloxTypeList.empty() && fieldIdx < veloxTypeList.size() && !isPushdownSupported(veloxTypeList.at(fieldIdx))) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it a bug if 'fieldIdx >= veloxTypeList.size()'? Shall we just add a check to ensure it does not happen? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
return false; | ||
} | ||
|
||
return true; | ||
} | ||
|
||
bool SubstraitToVeloxPlanConverter::canPushdownNot( | ||
|
@@ -1686,6 +1695,18 @@ bool SubstraitToVeloxPlanConverter::canPushdownOr( | |
return true; | ||
} | ||
|
||
bool SubstraitToVeloxPlanConverter::isPushdownSupported(TypePtr inputType) { | ||
// Keep the same with mapToFilters | ||
switch (inputType->kind()) { | ||
case TypeKind::TIMESTAMP: | ||
case TypeKind::VARBINARY: | ||
case TypeKind::HUGEINT: | ||
return false; | ||
default: | ||
return true; | ||
} | ||
} | ||
|
||
void SubstraitToVeloxPlanConverter::separateFilters( | ||
std::vector<RangeRecorder>& rangeRecorders, | ||
const std::vector<::substrait::Expression_ScalarFunction>& scalarFunctions, | ||
|
@@ -1712,19 +1733,6 @@ void SubstraitToVeloxPlanConverter::separateFilters( | |
for (const auto& scalarFunction : scalarFunctions) { | ||
auto filterNameSpec = SubstraitParser::findFunctionSpec(functionMap_, scalarFunction.function_reference()); | ||
auto filterName = SubstraitParser::getNameBeforeDelimiter(filterNameSpec); | ||
// Add all decimal filters to remaining functions because their pushdown are not supported. | ||
if (format == dwio::common::FileFormat::ORC && scalarFunction.arguments().size() > 0) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like previous logic only excludes decimal type for ORC format, but now we are excluding HUGEINT for all file formats. Could you confirm if it is expected? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ORC format already supports short-decimal predicate pushdown. @jiangjiangtian Can you help test this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main idea of these three types ( The reason why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That means actually we don't support the pushdown of long decimal in Parquet and ORC. Is that correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@rui-mo Currently yes. But to support a |
||
auto value = scalarFunction.arguments().at(0).value(); | ||
if (value.has_selection()) { | ||
uint32_t fieldIndex; | ||
bool parsed = SubstraitParser::parseReferenceSegment(value.selection().direct_reference(), fieldIndex); | ||
if (!parsed || (!veloxTypeList.empty() && veloxTypeList.at(fieldIndex)->isDecimal())) { | ||
remainingFunctions.emplace_back(scalarFunction); | ||
continue; | ||
} | ||
} | ||
} | ||
|
||
// Check whether NOT and OR functions can be pushed down. | ||
// If yes, the scalar function will be added into the subfield functions. | ||
if (filterName == sNot) { | ||
|
@@ -1742,7 +1750,7 @@ void SubstraitToVeloxPlanConverter::separateFilters( | |
} else { | ||
// Check if the condition is supported to be pushed down. | ||
uint32_t fieldIdx; | ||
if (canPushdownFunction(scalarFunction, filterName, fieldIdx) && | ||
if (canPushdownFunction(scalarFunction, filterName, fieldIdx, veloxTypeList) && | ||
rangeRecorders.at(fieldIdx).setCertainRangeForFunction(filterName)) { | ||
subfieldFunctions.emplace_back(scalarFunction); | ||
} else { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have made some updates on the timestamp support. Could you help check if the pushdown of timestamp is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's already supported in velox side. But as for now, Gluten can still fallback since
mapToFilters
doesn't support yet. I think we can support it in next PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Would you like to open an issue to track its support? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the issue #6642
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.