[GLUTEN-4763][VL] Add RewriteTypedImperativeAggregate rule for collect_list #4764

liujiayi771 · 2024-02-23T13:53:18Z

What changes were proposed in this pull request?

For some typed imperative aggregate functions, like collect_list/collect_set, Spark will make agg buffer stored as BinaryType in partial/ partial merge phase, even though these two functions' output is ArrayType. Gluten has no such special handling. For Velox backend, agg function's raw data type is used for agg buffer in partial/partial merge. Thus, there will be a type mismatch issue in the latter shuffle operator, as Gluten plan inherits attributes from its corresponding Spark plan. For collect_list/collect_set, shuffle expects BinaryType, but gets ArrayType.

So, we need a rule RewriteTypedImperativeAggregate to rewrite collect_list and collect_set for Velox backend. Referenced the implementation in #2669.

Velox does not register the companion functions for collect_set, to enable collect_set, modifications need to be made in Velox.

Currently, there are still some issues, such as the semantics of null, where Spark and Velox have many differences. At present, collect_list supports ignoring null values, but when all inputs are null, Spark returns an empty array, while Velox returns null. collect_set does not yet support ignoring null values, and there is the same problem when all inputs are null. Adaptations on the Velox side are still needed.

How was this patch tested?

Exists CI.
Add collect_list in VeloxAggregateFunctionsSuite.distinct functions
Add a new test case VeloxAggregateFunctionsSuite.collect_list null inputs

github-actions · 2024-02-23T13:53:36Z

#4763

github-actions · 2024-02-23T13:53:50Z

Run Gluten Clickhouse CI

github-actions · 2024-02-23T13:54:40Z

Run Gluten Clickhouse CI

github-actions · 2024-02-23T13:56:39Z

Run Gluten Clickhouse CI

liujiayi771 · 2024-02-23T13:57:59Z

cc @PHILO-HE, thanks.

github-actions · 2024-02-24T03:02:34Z

Run Gluten Clickhouse CI

github-actions · 2024-02-24T05:15:46Z

Run Gluten Clickhouse CI

PHILO-HE

Looks good to me. Thanks so much for your work!
I note collect_list/set were excluded for window operator. Maybe, they can be removed now. It can be done in a separate pr. Thanks!
https://github.com/oap-project/gluten/blob/bda60be4c44621c3722ecfbe55e98fd22f06b04a/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L645

liujiayi771 force-pushed the rewrite-typed-imperative-agg branch from dbaabba to 744dd73 Compare February 23, 2024 13:54

liujiayi771 force-pushed the rewrite-typed-imperative-agg branch from 744dd73 to f630c5b Compare February 23, 2024 13:56

liujiayi771 force-pushed the rewrite-typed-imperative-agg branch from f630c5b to 0bfd92b Compare February 24, 2024 03:01

Add RewriteTypedImperativeAggregate rule for collect_list/collect_set

908cfa2

liujiayi771 force-pushed the rewrite-typed-imperative-agg branch from 0bfd92b to 908cfa2 Compare February 24, 2024 05:15

PHILO-HE approved these changes Feb 26, 2024

View reviewed changes

PHILO-HE merged commit 427cb40 into apache:main Feb 26, 2024
19 checks passed

liujiayi771 deleted the rewrite-typed-imperative-agg branch February 26, 2024 09:44

zhztheplayer mentioned this pull request Mar 8, 2024

[VL] Add support of percentile_approx / approx_percentile #4889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-4763][VL] Add RewriteTypedImperativeAggregate rule for collect_list #4764

[GLUTEN-4763][VL] Add RewriteTypedImperativeAggregate rule for collect_list #4764

liujiayi771 commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

liujiayi771 commented Feb 23, 2024

github-actions bot commented Feb 24, 2024

github-actions bot commented Feb 24, 2024

PHILO-HE left a comment

[GLUTEN-4763][VL] Add RewriteTypedImperativeAggregate rule for collect_list #4764

[GLUTEN-4763][VL] Add RewriteTypedImperativeAggregate rule for collect_list #4764

Conversation

liujiayi771 commented Feb 23, 2024

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

github-actions bot commented Feb 23, 2024

liujiayi771 commented Feb 23, 2024

github-actions bot commented Feb 24, 2024

github-actions bot commented Feb 24, 2024

PHILO-HE left a comment

Choose a reason for hiding this comment