-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Set s.g.s.c.b.v.coalesceBatchesBeforeShuffle=true by default #6056
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
df1138e
to
36ecb49
Compare
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
/Benchmark TPCDS |
1 similar comment
/Benchmark TPCDS |
Run Gluten Clickhouse CI |
/Benchmark Velox TPCDS |
===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
Run Gluten Clickhouse CI |
bac26ca
to
46cde2c
Compare
Run Gluten Clickhouse CI |
@zhztheplayer @marin-ma Similarly if the batch size is too large, each column exceeds the L2 cache size, the performance should be very bad as well. Can you submit a similar PR to fix this? Split the large batchs into the small one during Split. |
@FelixYBW Do we need a new Operator for it? If it's only for split, we already have it #5536 |
Perhaps we can tweak the code of VeloxAppendBatches including renaming it to make it be able to do both appending and slicing by giving it a target batch size range. So the happening of these two operations can be made more consistent. |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
Related to #6009
Set spark.gluten.sql.columnar.backend.velox.coalesceBatchesBeforeShuffle=true to make sure shuffle's performance is always not impacted by the issue.
The default batch size is changed to
0.8 * GLUTEN_MAX_BATCH_SIZE
to add some flexibilities to avoid unexpected combinations from buffers that are already large enough.