Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Remove local sort for TopNRowNumber #6381

Merged
merged 2 commits into from
Jul 11, 2024
Merged

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jul 9, 2024

What changes were proposed in this pull request?

This pr adds a new rule EliminateLocalSort to unify the existed drop local sort code, and also correct TopNRowNumber operator require child ordering to help remove unneceesary local sort.

For now, the unnecessary local sort would happen if:

  • Convert sort merge join to shuffled hash join
  • Offload SortAggregate to native hash aggregate
  • Offload WindowGroupLimit to native TopNRowNumber
  • The columnar window type is sort

How was this patch tested?

add test & pass tests

Copy link

github-actions bot commented Jul 9, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Jul 9, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@ulysses-you ulysses-you marked this pull request as draft July 10, 2024 03:09
Copy link

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor Author

@GlutenPerfBot benchmark

@GlutenPerfBot
Copy link
Contributor

ACK, will benchmark TPCH/DS on this pull request

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

_,
SortExecTransformer(_, false, p2: ProjectExecTransformer, _))
if p1.outputSet == p2.child.outputSet =>
Some((p1, p2.child))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any more

      case p1 @ ProjectExec(_, SortExec(_, false, p2: ProjectExec, _))
          if p1.outputSet == p2.child.outputSet =>
        Some((p1, p2.child))
      case p1 @ProjectExecTransformer(
            _,
            SortExec(_, false, p2: ProjectExecTransformer, _))
          if p1.outputSet == p2.child.outputSet =>
        Some((p1, p2.child))

Copy link
Contributor Author

@ulysses-you ulysses-you Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it won't happen. We only pull out project from sort if the sort is offloaded, so the pre/post project should only happen with SortExecTransformer.

// from pre/post project-pulling
case ProjectLike(PartialSortLike(ProjectLike(child))) if plan.outputSet == child.outputSet =>
child
case ProjectLike(PartialSortLike(child)) => plan.withNewChildren(Seq(child))
Copy link
Contributor

@zml1206 zml1206 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is lost. Generated by PullOutPostProject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @zml1206 for the reminder

weixiuli pushed a commit to weixiuli/gluten that referenced this pull request Jul 10, 2024
Rebase velox 2023-12-21
```
2ec4b5d Refactor Unnest::generateOutput method (apache#8077)
4ab8c0d Add error messages when malloc allocator hits capacity limit (apache#8075)
298260f Track the file ages for AyncDataCache and SsdCache (apache#6381)
4fe3738 Change enable_sorted_aggregations gflag to true by default (apache#8112)
992d5a6 Fix flaky SharedArbitrationTest.failedToReclaimFromHashJoinBuildersIn… (apache#8120)
cdbd430 Fix links in October 2023 monthly update (apache#8116)
```
Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@ulysses-you ulysses-you marked this pull request as ready for review July 11, 2024 01:06
@ulysses-you
Copy link
Contributor Author

@GlutenPerfBot benchmark

1 similar comment
@ulysses-you
Copy link
Contributor Author

@GlutenPerfBot benchmark

Copy link

Run Gluten Clickhouse CI

@GlutenPerfBot
Copy link
Contributor

ACK, will benchmark TPCH/DS on this pull request

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6381_time.csv log/native_master_07_10_2024_e8e93e73c_time.csv difference percentage
q1 37.14 33.46 -3.681 90.09%
q2 23.68 28.57 4.888 120.64%
q3 39.05 38.44 -0.613 98.43%
q4 32.72 35.14 2.417 107.39%
q5 70.04 71.06 1.014 101.45%
q6 7.67 6.45 -1.216 84.14%
q7 85.35 83.24 -2.103 97.54%
q8 83.91 88.10 4.187 104.99%
q9 119.57 119.97 0.405 100.34%
q10 45.10 43.77 -1.335 97.04%
q11 21.70 20.45 -1.247 94.25%
q12 24.11 26.84 2.730 111.32%
q13 40.69 38.42 -2.275 94.41%
q14 20.15 20.94 0.785 103.90%
q15 33.08 30.00 -3.085 90.68%
q16 14.26 15.85 1.581 111.08%
q17 102.91 105.14 2.225 102.16%
q18 147.48 145.23 -2.253 98.47%
q19 13.78 14.76 0.978 107.10%
q20 27.38 26.50 -0.873 96.81%
q21 262.52 267.51 4.994 101.90%
q22 12.19 12.07 -0.123 98.99%
total 1264.48 1271.88 7.400 100.59%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_6381_time.csv log/native_master_07_10_2024_e8e93e73c9_time.csv difference percentage
q1 14.60 15.29 0.685 104.69%
q2 15.06 13.95 -1.119 92.57%
q3 4.65 4.22 -0.424 90.87%
q4 70.18 63.44 -6.738 90.40%
q5 10.47 7.70 -2.766 73.59%
q6 2.44 3.27 0.831 134.08%
q7 4.19 6.43 2.246 153.65%
q8 5.05 3.61 -1.447 71.36%
q9 16.81 17.20 0.382 102.27%
q10 10.96 11.04 0.081 100.74%
q11 35.67 35.23 -0.438 98.77%
q12 1.50 2.44 0.943 162.83%
q13 5.34 5.60 0.264 104.95%
q14a 43.46 40.98 -2.481 94.29%
q14b 39.49 43.17 3.683 109.33%
q15 2.50 2.78 0.276 111.02%
q16 39.62 38.06 -1.567 96.05%
q17 4.72 4.83 0.111 102.35%
q18 6.45 6.24 -0.207 96.78%
q19 2.21 2.24 0.033 101.48%
q20 2.53 1.54 -0.992 60.84%
q21 1.04 4.56 3.522 440.28%
q22 9.86 9.19 -0.666 93.25%
q23a 80.17 83.39 3.229 104.03%
q23b 103.37 106.18 2.813 102.72%
q24a 74.67 73.27 -1.405 98.12%
q24b 79.54 66.71 -12.825 83.88%
q25 4.35 4.42 0.070 101.62%
q26 3.05 4.40 1.356 144.51%
q27 10.45 3.32 -7.125 31.80%
q28 22.84 20.93 -1.913 91.63%
q29 9.72 6.97 -2.741 71.79%
q30 5.14 7.19 2.052 139.96%
q31 7.38 6.56 -0.820 88.89%
q32 2.10 1.23 -0.873 58.45%
q33 4.97 4.82 -0.147 97.03%
q34 3.93 4.72 0.794 120.23%
q35 11.73 8.03 -3.698 68.48%
q36 4.01 3.32 -0.683 82.95%
q37 4.20 3.68 -0.521 87.59%
q38 12.11 11.62 -0.483 96.01%
q39a 3.09 3.62 0.524 116.95%
q39b 2.85 3.08 0.230 108.10%
q40 5.20 3.65 -1.552 70.17%
q41 0.63 0.62 -0.011 98.23%
q42 0.96 0.97 0.003 100.27%
q43 3.75 8.28 4.534 220.96%
q44 9.09 8.68 -0.404 95.55%
q45 3.44 3.29 -0.159 95.39%
q46 3.68 3.42 -0.251 93.16%
q47 19.95 14.48 -5.468 72.59%
q48 4.66 4.48 -0.174 96.26%
q49 9.64 9.63 -0.010 99.90%
q50 19.35 22.95 3.603 118.62%
q51 7.94 8.78 0.840 110.58%
q52 1.01 1.05 0.041 104.02%
q53 2.01 1.95 -0.057 97.16%
q54 3.35 3.43 0.080 102.38%
q55 1.92 1.06 -0.853 55.51%
q56 4.52 4.64 0.121 102.69%
q57 8.92 8.77 -0.152 98.29%
q58 2.64 2.60 -0.041 98.43%
q59 14.10 13.81 -0.289 97.95%
q60 4.64 4.72 0.081 101.75%
q61 5.38 5.48 0.108 102.00%
q62 3.68 4.79 1.111 130.22%
q63 2.13 2.20 0.064 102.99%
q64 50.48 50.06 -0.415 99.18%
q65 13.55 15.60 2.053 115.15%
q66 5.86 3.61 -2.248 61.65%
q67 354.09 352.35 -1.737 99.51%
q68 3.62 3.77 0.151 104.18%
q69 6.47 6.53 0.064 100.99%
q70 9.01 12.80 3.788 142.04%
q71 2.57 3.93 1.362 153.06%
q72 184.61 188.46 3.845 102.08%
q73 2.14 3.51 1.370 164.12%
q74 21.09 21.34 0.245 101.16%
q75 23.29 22.96 -0.325 98.61%
q76 9.50 9.18 -0.317 96.66%
q77 2.11 2.15 0.040 101.92%
q78 38.45 41.75 3.297 108.57%
q79 3.51 3.67 0.156 104.44%
q80 14.25 11.12 -3.130 78.04%
q81 6.07 5.10 -0.972 84.01%
q82 6.55 9.58 3.032 146.28%
q83 1.69 1.48 -0.209 87.65%
q84 3.03 3.09 0.056 101.84%
q85 6.80 6.79 -0.012 99.83%
q86 3.31 3.36 0.043 101.31%
q87 12.08 12.14 0.066 100.55%
q88 28.44 25.04 -3.401 88.04%
q89 3.26 4.74 1.476 145.23%
q90 4.37 9.35 4.979 213.81%
q91 2.72 2.70 -0.020 99.26%
q92 1.28 1.45 0.172 113.45%
q93 27.84 27.59 -0.247 99.11%
q94 24.30 21.18 -3.114 87.18%
q9 82.39 85.59 3.201 103.89%
q5 3.83 3.57 -0.261 93.17%
q96 12.11 12.10 -0.013 99.89%
q97 2.08 2.10 0.022 101.05%
q98 9.17 9.68 0.508 105.54%
q99 9.17 9.68 0.508 105.54%
total 1926.89 1913.61 -13.282 99.31%

@ulysses-you
Copy link
Contributor Author

cc @zhztheplayer @JkSelf @zml1206 thank you

} else {
Seq(Nil)
Seq(partitionSpec.map(SortOrder(_, Ascending)) ++ orderSpec)
Copy link
Contributor

@zml1206 zml1206 Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super.requiredChildOrdering Implemented differently in different spark versions. In addition, according to the original logic, ck backend should not need child order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1, it's a good point, it seems the WindowExecBase has implemented these methods since 3.3
for 2, the ck backend requires child ordering and before we did not remove local sort before window

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should retain requiredChildOrderingForWindow instead of using velox configuration to control ck backend
although it has no effect by default value.

* - Offload WindowGroupLimit to native TopNRowNumber
* - The columnar window type is `sort`
*/
object EliminateLocalSort extends Rule[SparkPlan] {
Copy link
Contributor

@JkSelf JkSelf Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you Do we have chance to eliminate the local sort for VeloxColumnarWriteFilesExec here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some limitation to remove sort before write files, e.g., we can not remove the sort which is created by user (to improve compression ratio). We can only remove the sort which is added from Spark, but it's hard to find that sort in this new rule only using require child ordering framework, we need to check if all sort fields are partition columns.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor Author

The ck failure is irrelevant. @zml1206 @JkSelf any more comments ?

@zml1206
Copy link
Contributor

zml1206 commented Jul 11, 2024

LGTM

@yaooqinn yaooqinn merged commit 0448115 into apache:main Jul 11, 2024
6 checks passed
@ulysses-you ulysses-you deleted the sort branch July 11, 2024 12:08
Copy link

Run Gluten Clickhouse CI

yma11 added a commit to yma11/gluten that referenced this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants