-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Fix function input_file_name()
outputs empty string in certain query plan patterns
#7124
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
cc @zhztheplayer Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good to me. Thank you @zml1206.
cc @liuneng1994, if you are considering switching to this way, perhaps we need a PR for CH individually to add the rules to CH rule list after this PR gets landed.
ProjectExec(newProjectList, newChild) | ||
} | ||
|
||
private def rewriteExpr(expr: Expression, replacedExprs: Map[String, Alias]): Expression = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be replacedExprs: mutable.Map[String, Alias]
} | ||
} | ||
|
||
def addMetadataCol(plan: SparkPlan, replacedExprs: Map[String, Alias]): SparkPlan = plan match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be replacedExprs: mutable.Map[String, Alias]
@@ -52,6 +52,7 @@ private object VeloxRuleApi { | |||
def injectLegacy(injector: LegacyInjector): Unit = { | |||
// Gluten columnar: Transform rules. | |||
injector.injectTransform(_ => RemoveTransitions) | |||
injector.injectTransform(_ => PushDownInputFileExpressionBeforeLeaf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a rename for the two rules. Since they seem to always be used together as a rule pair.
injector.injectTransform(_ => PushDownInputFileExpressionBeforeLeaf) | |
injector.injectTransform(_ => PushDownInputFileExpression.PreOffload) |
@@ -64,6 +65,7 @@ private object VeloxRuleApi { | |||
injector.injectTransform(_ => TransformPreOverrides()) | |||
injector.injectTransform(_ => RemoveNativeWriteFilesSortAndProject()) | |||
injector.injectTransform(c => RewriteTransformer.apply(c.session)) | |||
injector.injectTransform(_ => PushDownInputFileExpressionToScan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly,
injector.injectTransform(_ => PushDownInputFileExpressionToScan) | |
injector.injectTransform(_ => PushDownInputFileExpression.PostOffload) |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
input_file_name()
outputs empty string in certain query plan patterns
… query plan patterns (apache#7124)
… query plan patterns (apache#7124)
… query plan patterns (apache#7124)
… query plan patterns (apache#7124)
What changes were proposed in this pull request?
The Spark implementations of input_file_name/input_file_block_start/input_file_block_length uses a thread local to stash the file name and retrieve it from the function. If there is a transformer node between project input_file_function and scan, the result of input_file_name is an empty string. So we should push down input_file_function to transformer scan or add fallback project of input_file_function before fallback scan.
The processing logic is as follows:
1.Before offload, add new project before leaf node and push down input file expression to the new project
2.Normal offload project and scan
3.After offload, if scan be offloaded, push down input file expression into scan and remove project
How was this patch tested?
UT