Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix function input_file_name() outputs empty string in certain query plan patterns #7124

Merged
merged 4 commits into from
Sep 6, 2024

Conversation

zml1206
Copy link
Contributor

@zml1206 zml1206 commented Sep 5, 2024

What changes were proposed in this pull request?

The Spark implementations of input_file_name/input_file_block_start/input_file_block_length uses a thread local to stash the file name and retrieve it from the function. If there is a transformer node between project input_file_function and scan, the result of input_file_name is an empty string. So we should push down input_file_function to transformer scan or add fallback project of input_file_function before fallback scan.

The processing logic is as follows:
1.Before offload, add new project before leaf node and push down input file expression to the new project
2.Normal offload project and scan
3.After offload, if scan be offloaded, push down input file expression into scan and remove project

How was this patch tested?

UT

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Sep 5, 2024
Copy link

github-actions bot commented Sep 5, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Sep 5, 2024

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as draft September 5, 2024 07:06
Copy link

github-actions bot commented Sep 5, 2024

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as ready for review September 5, 2024 10:14
@zml1206
Copy link
Contributor Author

zml1206 commented Sep 5, 2024

cc @zhztheplayer Thank you.

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good to me. Thank you @zml1206.

cc @liuneng1994, if you are considering switching to this way, perhaps we need a PR for CH individually to add the rules to CH rule list after this PR gets landed.

ProjectExec(newProjectList, newChild)
}

private def rewriteExpr(expr: Expression, replacedExprs: Map[String, Alias]): Expression = {
Copy link
Member

@zhztheplayer zhztheplayer Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be replacedExprs: mutable.Map[String, Alias]

}
}

def addMetadataCol(plan: SparkPlan, replacedExprs: Map[String, Alias]): SparkPlan = plan match {
Copy link
Member

@zhztheplayer zhztheplayer Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be replacedExprs: mutable.Map[String, Alias]

@@ -52,6 +52,7 @@ private object VeloxRuleApi {
def injectLegacy(injector: LegacyInjector): Unit = {
// Gluten columnar: Transform rules.
injector.injectTransform(_ => RemoveTransitions)
injector.injectTransform(_ => PushDownInputFileExpressionBeforeLeaf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a rename for the two rules. Since they seem to always be used together as a rule pair.

Suggested change
injector.injectTransform(_ => PushDownInputFileExpressionBeforeLeaf)
injector.injectTransform(_ => PushDownInputFileExpression.PreOffload)

@@ -64,6 +65,7 @@ private object VeloxRuleApi {
injector.injectTransform(_ => TransformPreOverrides())
injector.injectTransform(_ => RemoveNativeWriteFilesSortAndProject())
injector.injectTransform(c => RewriteTransformer.apply(c.session))
injector.injectTransform(_ => PushDownInputFileExpressionToScan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly,

Suggested change
injector.injectTransform(_ => PushDownInputFileExpressionToScan)
injector.injectTransform(_ => PushDownInputFileExpression.PostOffload)

Copy link

github-actions bot commented Sep 6, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 6, 2024

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title [VL] Fix input_file_name results in empty string [VL] Fix function input_file_name() outputs empty string in certain query plan patterns Sep 6, 2024
@zhztheplayer zhztheplayer merged commit 8078f24 into apache:main Sep 6, 2024
43 of 44 checks passed
dcoliversun pushed a commit to dcoliversun/gluten that referenced this pull request Sep 11, 2024
zml1206 added a commit to zml1206/incubator-gluten that referenced this pull request Sep 19, 2024
zml1206 added a commit to zml1206/incubator-gluten that referenced this pull request Sep 20, 2024
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants