[VL] Remaining issues for typed imperative aggregate #4763

liujiayi771 · 2024-02-23T13:49:26Z

Description

collect_list
- If all input values are null, vanilla Spark return an empty array, but Velox return null.
collect_set
- Velox does not register the companion functions of set_agg.
- If all input values are null, vanilla Spark return an empty array, but Velox return null.
- If there are nulls in the input values(not all values are null), vanilla Spark ignore null input, but Velox not ignore null input.

Exclude UTs:

SPARK-31993: concat_ws in agg function with plenty of string/array types columns in GlutenStringFunctionsSuite
Reason: If all input values are null, collect_list in vanilla Spark return an empty array, but array_agg in Velox return null.

The text was updated successfully, but these errors were encountered:

felipepessoto · 2024-05-29T23:13:29Z

@liujiayi771 do you know if collect_set is not expected to work with complex types if the value is null? Example, this works with Spark, but doesn't work when Gluten is enabled:

import org.apache.spark.sql.functions._

val jsonStr = """{"txn":{"appId":"txnId","version":0,"lastUpdated":null}}"""
val jsonSchema = StructType(Seq(StructField("txn",
  StructType(Seq(StructField("appId",StringType,true),StructField("lastUpdated",LongType,true),StructField("version",LongType,true))),true
)))
val df = spark.read.schema(jsonSchema).json(Seq(jsonStr).toDS).select(collect_set(col("txn")))    
df.head

Error:

[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (c7f5 executor driver): org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxUserError
[info] Error Source: USER
[info] Error Code: INVALID_ARGUMENT
[info] Reason: ROW comparison not supported for values that contain nulls
[info] Retriable: False
[info] Expression: !decoded.base()->containsNullAt(indices[index])
[info] Function: checkNestedNulls
[info] File: /__w/1/s/Velox/velox/functions/lib/CheckNestedNulls.cpp
[info] Line: 34

liujiayi771 · 2024-05-30T01:18:31Z

@felipepessoto This is a known issue, the Velox backend does not yet support it.

liujiayi771 added the enhancement New feature or request label Feb 23, 2024

github-actions bot mentioned this issue Feb 23, 2024

[GLUTEN-4763][VL] Add RewriteTypedImperativeAggregate rule for collect_list #4764

Merged

PHILO-HE mentioned this issue Mar 6, 2024

[VL] Daily Update Velox Version (2024_03_05) #4852

Merged

liujiayi771 mentioned this issue Mar 7, 2024

[VL] Rewrite collect_set and collect_list aggregate function #4805

Merged

zhli1142015 mentioned this issue May 31, 2024

[VL] Fall back collect_set, min and max when input is complex type #5934

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Remaining issues for typed imperative aggregate #4763

[VL] Remaining issues for typed imperative aggregate #4763

liujiayi771 commented Feb 23, 2024

felipepessoto commented May 29, 2024 •

edited

Loading

liujiayi771 commented May 30, 2024

[VL] Remaining issues for typed imperative aggregate #4763

[VL] Remaining issues for typed imperative aggregate #4763

Comments

liujiayi771 commented Feb 23, 2024

Description

felipepessoto commented May 29, 2024 • edited Loading

liujiayi771 commented May 30, 2024

felipepessoto commented May 29, 2024 •

edited

Loading