Replies: 3 comments 27 replies
-
@JkSelf Thank you for the examples. It is very helpful. I'm hesitant to introduce this operator to Velox since it feels legacy. GroupId is more elegant, easier to reason about and less verbose solution for ROLLUP queries. It feels like it should be possible to identify a pattern of Expand followed by Aggregation and convert it to GroupId followed by Aggregation. Is this something you would consider implementing in Gluten? |
Beta Was this translation helpful? Give feedback.
-
@JkSelf Hi, #6566 and #6568 . How much will performance improve after merging these two PRs? |
Beta Was this translation helpful? Give feedback.
-
Spark utilizes the Expand operator to handle ROLLUP functions, yet it differs from GroupID. The Expand operator focuses on
projections
, includingaggregation
expressions,grouping
expressions, andspark_grouping_id
. Its purpose is to demonstrate how a single row can be expanded into multiple rows. However, unlike GroupID, the Expand operator in Spark cannot explicitly distinguish between theaggregation
key and thegrouping
key based onprojections
alone.Take the lineitem table in TPCH as an example. And the query is
select sum(l_suppkey) from lineitem where group by ROLLUP(l_orderkey, l_partkey)
. The physical plan in spark is below.The input and output for Expand and HashAggregate operators are illustrated below. It is hard to distinguish the aggregation key and grouping key in Expand operator. So we added a new Expand operator in Velox to align with Spark. See #5403
Beta Was this translation helpful? Give feedback.
All reactions