-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Only materialize subquery before doing transform #5862
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
/Benchmark Velox TPCDS |
1 similar comment
/Benchmark Velox TPCDS |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
// the first column in first row from `query`. | ||
val rows = query.plan.executeCollect() | ||
if (rows.length > 1) { | ||
throw new IllegalStateException( | ||
s"more than one row returned by a subquery used as an expression:\n${query.plan}") | ||
} | ||
val result: AnyRef = if (rows.length == 1) { | ||
assert( | ||
rows(0).numFields == 1, | ||
s"Expects 1 field, but got ${rows(0).numFields}; something went wrong in analysis") | ||
rows(0).get(0, query.dataType) | ||
} else { | ||
// If there is no rows returned, the result should be null. | ||
null | ||
} | ||
val result = query.eval(InternalRow.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this pr, we do not need to execute subquery manually so the exception behavior is same with vanilla Spark. Note that, this code change is just for simplify. The subquery has already been materialized before doing transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you, it would be nice to add these comment in the code.
cc @zhztheplayer @PHILO-HE @zzcclp thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! Please reword PR description to help reader figure out the motivation.
Can we use "transform" to replace "executeTransform"? I guess there is some convention: the concrete implementation of doXXX
is generally called inside XXX
.
@PHILO-HE yes, I agree the convention in general.. The reason using |
I note Spark's calling path is |
Run Gluten Clickhouse CI |
@PHILO-HE I see, addressed |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one trivial comment.
cc @zhichao, @zhztheplayer, please review further.
// the first column in first row from `query`. | ||
val rows = query.plan.executeCollect() | ||
if (rows.length > 1) { | ||
throw new IllegalStateException( | ||
s"more than one row returned by a subquery used as an expression:\n${query.plan}") | ||
} | ||
val result: AnyRef = if (rows.length == 1) { | ||
assert( | ||
rows(0).numFields == 1, | ||
s"Expects 1 field, but got ${rows(0).numFields}; something went wrong in analysis") | ||
rows(0).get(0, query.dataType) | ||
} else { | ||
// If there is no rows returned, the result should be null. | ||
null | ||
} | ||
val result = query.eval(InternalRow.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you, it would be nice to add these comment in the code.
Run Gluten Clickhouse CI |
/Benchmark Velox TPCDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@PHILO-HE @zhztheplayer I'm not sure the TPCDS benchmark has triggered and run successfully, can you help check the internal state? thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/Benchmark Velox |
It's triggered, but not executed successfully due to download failure for some deps. Just fixed. It's running now. |
===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
We transform subquery(e.g., dpp) during columanr rules which is not actually been executed, so we should not materialize subquery when replacing expression as it is not in concurrent. This pr wraps
doTransform
withtransform
to always do materialize subquery before doTransform, so that the subquries can be submitted in concurrent.How was this patch tested?
Pass CI (also no regression)