Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Use SortShuffleManager instance in ColumnarShuffleManager #6022

Merged
merged 9 commits into from
Jun 13, 2024

Conversation

acvictor
Copy link
Contributor

@acvictor acvictor commented Jun 7, 2024

What changes were proposed in this pull request?

This PR makes a change to introduce an object of SortShuffleManager in ColumnarShuffleManager to reduce code overlap.

How was this patch tested?

SortShuffleManagerSuite has already been enabled against ColumnarShuffleManager #5816

Copy link

github-actions bot commented Jun 7, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Jun 7, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jun 7, 2024

Run Gluten Clickhouse CI

@acvictor
Copy link
Contributor Author

acvictor commented Jun 8, 2024

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6022_time.csv log/native_master_06_07_2024_85e8619f12_time.csv difference percentage
q1 34.16 34.74 0.584 101.71%
q2 24.59 23.82 -0.776 96.84%
q3 36.75 37.27 0.519 101.41%
q4 33.86 30.13 -3.731 88.98%
q5 69.01 70.45 1.446 102.10%
q6 7.55 7.45 -0.101 98.66%
q7 81.89 83.57 1.676 102.05%
q8 87.32 86.74 -0.576 99.34%
q9 121.00 121.95 0.949 100.78%
q10 45.90 45.15 -0.754 98.36%
q11 23.86 19.51 -4.357 81.74%
q12 26.61 26.76 0.152 100.57%
q13 37.99 39.39 1.408 103.71%
q14 21.04 20.54 -0.499 97.63%
q15 31.63 31.52 -0.105 99.67%
q16 12.95 13.65 0.703 105.43%
q17 104.35 103.56 -0.797 99.24%
q18 144.05 145.21 1.157 100.80%
q19 14.83 14.70 -0.137 99.08%
q20 26.64 28.92 2.279 108.56%
q21 257.09 262.04 4.950 101.93%
q22 13.05 14.56 1.506 111.54%
total 1256.13 1261.63 5.497 100.44%

@acvictor acvictor marked this pull request as ready for review June 8, 2024 06:56
@acvictor
Copy link
Contributor Author

acvictor commented Jun 8, 2024

@marin-ma can you please review this? Thank you!

@acvictor acvictor changed the title Use SortShuffleManager instance in ColumnarShuffleManager [CORE] Use SortShuffleManager instance in ColumnarShuffleManager Jun 8, 2024
@acvictor
Copy link
Contributor Author

@rui-mo can you too please review this? Thanks!

Copy link

Run Gluten Clickhouse CI

marin-ma
marin-ma previously approved these changes Jun 11, 2024
Copy link
Contributor

@marin-ma marin-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

class ColumnarShuffleManager(conf: SparkConf) extends ShuffleManager with Logging {

import ColumnarShuffleManager._

private lazy val shuffleExecutorComponents = loadShuffleExecutorComponents(conf)
private[this] val sortShuffleManager: SortShuffleManager = new SortShuffleManager(conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lazy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 158 to 123
Option(taskIdMapsForShuffle.remove(shuffleId)).foreach {
mapTaskIds =>
mapTaskIds.iterator.foreach {
mapId => shuffleBlockResolver.removeDataByMap(shuffleId, mapId)
}
}
true
sortShuffleManager.unregisterShuffle(shuffleId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it same with before ? The taskIdMapsForShuffle is maintained by ColumnarShuffleManager so sortShuffleManager. unregisterShuffle will do nothing since the taskIdMapsForShuffle is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this thanks, updated shuffleBlockResolver related flow.

}

/** Shut down this ShuffleManager. */
override def stop(): Unit = {
shuffleBlockResolver.stop()
sortShuffleManager.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shuffleBlockResolver is hold in ColumnarShuffleManager, how can SortShuffleManager stop it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@marin-ma marin-ma dismissed their stale review June 11, 2024 01:53

pending

Copy link

Run Gluten Clickhouse CI

@acvictor
Copy link
Contributor Author

@ulysses-you can you please review again? Thank you!

Comment on lines 64 to 66
mapTaskIds.synchronized {
mapTaskIds.add(context.taskAttemptId())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this code in case columnarShuffleHandle: ColumnarShuffleHandle ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to case columnarShuffleHandle: ColumnarShuffleHandle

@ulysses-you
Copy link
Contributor

ulysses-you commented Jun 11, 2024

shall we call unregisterShuffle and stop if it is delegated to sort shuffle manager ? e.g.,

if (taskIdMapsForShuffle.contains(shuffleId)) {
  this.stop
} else {
  delegated.stop
}

Copy link

Run Gluten Clickhouse CI

@acvictor
Copy link
Contributor Author

shall we call unregisterShuffle and stop if it is delegated to sort shuffle manager ? e.g.,

if (taskIdMapsForShuffle.contains(shuffleId)) {
  this.stop
} else {
  delegated.stop
}

Updated unregisterShuffle and stop to this pattern.

@acvictor
Copy link
Contributor Author

Thank you @ulysses-you for the comments! Updated the PR.

ulysses-you
ulysses-you previously approved these changes Jun 12, 2024
Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you @acvictor , cc @marin-ma

@marin-ma
Copy link
Contributor

@ulysses-you Thanks for review!

@acvictor Could you help to add a UT to test the fallback behavior for columnar shuffle? Perhaps, add one test case in FallbackSuite.scala and set spark.gluten.sql.columnar.shuffle=false force the Exchange operator to fall back

Copy link

Run Gluten Clickhouse CI

df =>
val plan = df.queryExecution.executedPlan
val columnarShuffle = find(plan) {
case _: ColumnarShuffledJoin => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the number ofColumnarShuffleExchangeExec and ShuffleExchangeExec. You can use repartition hint to construct the sql https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#partitioning-hints

e.g. select /*+ REPARTITION(3, c1) */ * FROM tmp1;

Then the check should be like:

assert(collectColumnarShuffleExchange(plan) == 0)
assert(collectShuffleExchange(plan) == 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current query this is the plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   VeloxColumnarToRowExec
   +- ^(6) HashAggregateTransformer(keys=[c1#14], functions=[count(1)], output=[c1#14, count(1)#26L])
      +- ^(6) InputIteratorTransformer[c1#14, count#30L]
         +- ^(6) InputAdapter
            +- ^(6) RowToVeloxColumnar
               +- ^(6) AQEShuffleRead coalesced
                  +- ^(6) ShuffleQueryStage 0
                     +- Exchange hashpartitioning(c1#14, 5), ENSURE_REQUIREMENTS, [plan_id=281]
                        +- VeloxColumnarToRowExec
                           +- ^(5) HashAggregateTransformer(keys=[c1#14], functions=[partial_count(1)], output=[c1#14, count#30L])
                              +- ^(5) NativeFileScan parquet spark_catalog.default.tmp11111[c1#14] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/anvicto/src/Gluten/spark-warehouse/org.apache.gluten.execut..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c1:int>

Can I retain it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to me. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised one small PR here #6055 to enable fallback for shuffle.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@acvictor
Copy link
Contributor Author

@marin-ma added a test.

Copy link
Contributor

@marin-ma marin-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@marin-ma marin-ma merged commit 2ae80af into apache:main Jun 13, 2024
40 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6022_time.csv log/native_master_06_12_2024_7445f02c9_time.csv difference percentage
q1 35.67 33.87 -1.796 94.96%
q2 23.60 26.51 2.905 112.31%
q3 36.84 36.94 0.106 100.29%
q4 32.41 34.65 2.244 106.93%
q5 70.83 69.47 -1.358 98.08%
q6 7.61 5.84 -1.772 76.72%
q7 84.08 82.18 -1.901 97.74%
q8 83.77 85.19 1.414 101.69%
q9 118.64 120.19 1.543 101.30%
q10 44.72 43.54 -1.180 97.36%
q11 22.67 20.47 -2.197 90.31%
q12 26.37 24.09 -2.280 91.35%
q13 37.56 37.22 -0.342 99.09%
q14 19.24 23.00 3.754 119.51%
q15 30.13 30.20 0.073 100.24%
q16 13.74 13.79 0.047 100.34%
q17 103.62 102.85 -0.770 99.26%
q18 143.92 145.02 1.098 100.76%
q19 13.62 16.14 2.518 118.49%
q20 29.19 27.34 -1.848 93.67%
q21 262.53 263.30 0.765 100.29%
q22 12.33 12.08 -0.247 98.00%
total 1253.09 1253.86 0.775 100.06%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants