-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Remove local sort for TopNRowNumber #6381
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@GlutenPerfBot benchmark |
ACK, will benchmark TPCH/DS on this pull request |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
_, | ||
SortExecTransformer(_, false, p2: ProjectExecTransformer, _)) | ||
if p1.outputSet == p2.child.outputSet => | ||
Some((p1, p2.child)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any more
case p1 @ ProjectExec(_, SortExec(_, false, p2: ProjectExec, _))
if p1.outputSet == p2.child.outputSet =>
Some((p1, p2.child))
case p1 @ProjectExecTransformer(
_,
SortExec(_, false, p2: ProjectExecTransformer, _))
if p1.outputSet == p2.child.outputSet =>
Some((p1, p2.child))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it won't happen. We only pull out project from sort if the sort is offloaded, so the pre/post project should only happen with SortExecTransformer.
// from pre/post project-pulling | ||
case ProjectLike(PartialSortLike(ProjectLike(child))) if plan.outputSet == child.outputSet => | ||
child | ||
case ProjectLike(PartialSortLike(child)) => plan.withNewChildren(Seq(child)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is lost. Generated by PullOutPostProject
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @zml1206 for the reminder
Rebase velox 2023-12-21 ``` 2ec4b5d Refactor Unnest::generateOutput method (apache#8077) 4ab8c0d Add error messages when malloc allocator hits capacity limit (apache#8075) 298260f Track the file ages for AyncDataCache and SsdCache (apache#6381) 4fe3738 Change enable_sorted_aggregations gflag to true by default (apache#8112) 992d5a6 Fix flaky SharedArbitrationTest.failedToReclaimFromHashJoinBuildersIn… (apache#8120) cdbd430 Fix links in October 2023 monthly update (apache#8116) ```
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
@GlutenPerfBot benchmark |
1 similar comment
@GlutenPerfBot benchmark |
Run Gluten Clickhouse CI |
ACK, will benchmark TPCH/DS on this pull request |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====
|
cc @zhztheplayer @JkSelf @zml1206 thank you |
} else { | ||
Seq(Nil) | ||
Seq(partitionSpec.map(SortOrder(_, Ascending)) ++ orderSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super.requiredChildOrdering Implemented differently in different spark versions. In addition, according to the original logic, ck backend should not need child order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 1, it's a good point, it seems the WindowExecBase
has implemented these methods since 3.3
for 2, the ck backend requires child ordering and before we did not remove local sort before window
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should retain requiredChildOrderingForWindow instead of using velox configuration to control ck backend
although it has no effect by default value.
* - Offload WindowGroupLimit to native TopNRowNumber | ||
* - The columnar window type is `sort` | ||
*/ | ||
object EliminateLocalSort extends Rule[SparkPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you Do we have chance to eliminate the local sort for VeloxColumnarWriteFilesExec here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some limitation to remove sort before write files, e.g., we can not remove the sort which is created by user (to improve compression ratio). We can only remove the sort which is added from Spark, but it's hard to find that sort in this new rule only using require child ordering framework, we need to check if all sort fields are partition columns.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
LGTM |
Run Gluten Clickhouse CI |
This reverts commit 0448115.
What changes were proposed in this pull request?
This pr adds a new rule
EliminateLocalSort
to unify the existed drop local sort code, and also correctTopNRowNumber
operator require child ordering to help remove unneceesary local sort.For now, the unnecessary local sort would happen if:
sort
How was this patch tested?
add test & pass tests