[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

wang-zhun · 2024-09-02T07:26:18Z

Backend

VL (Velox)

Bug description

Exception information

ERROR

Context: Split [Hive: hdfs://xxx/xxx.parquet 4 - 133633304] Task Gluten_Stage_2_TID_2_VTID_2
Additional Context: Operator: TableScan[0] 0
Function: readInt
File: /incubator-gluten/ep/build-velox/build/velox_ep/./velox/dwio/common/IntDecoder.h
Line: 448
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorENS1_22CompileTimeEmptyStringEEEvRKNS1_18VeloxCheckFailArgsET0_

Plan

== Physical Plan ==
AdaptiveSparkPlan (44)
+- == Current Plan ==
   SortMergeJoin Inner (33)
   :- Sort (14)
   :  +- ShuffleQueryStage (13)
   :     +- ColumnarExchange (12)
   :        +- VeloxAppendBatches (11)
   :           +- ^ ProjectExecTransformer (9)
   :              +- ^ InputIteratorTransformer (8)
   :                 +- RowToVeloxColumnar (6)
   :                    +- * Project (5)
   :                       +- * Filter (4)
   :                          +- VeloxColumnarToRowExec (3)
   :                             +- ^ IcebergScanTransformer (1)
   +- Sort (32)
      +- ShuffleQueryStage (31)
         +- ColumnarExchange (30)
            +- VeloxAppendBatches (29)
               +- ^ ProjectExecTransformer (27)
                  +- ^ InputIteratorTransformer (26)
                     +- RowToVeloxColumnar (24)
                        +- * Project (23)
                           +- VeloxColumnarToRowExec (22)
                              +- ^ FilterExecTransformer (20)
                                 +- ^ InputIteratorTransformer (19)
                                    +- RowToVeloxColumnar (17)
                                       +- * ColumnarToRow (16)
                                          +- BatchScan (15)
+- == Initial Plan ==
   SortMergeJoin Inner (43)
   :- Sort (38)
   :  +- Exchange (37)
   :     +- Project (36)
   :        +- Filter (35)
   :           +- BatchScan (34)
   +- Sort (42)
      +- Exchange (41)
         +- Project (40)
            +- Filter (39)
               +- BatchScan (15)

Reason

BatchScan (34)： RuntimeFilters: [dynamicpruningexpression(true)]
Filter (35) ： Vanilla filter

There is an unsupported timestamp type in BatchScan (34), which should normally trigger a fallback. However, when these conditions are met, BatchScan (34) skips the fallback check.
skip1
skip2

Spark version

Spark-3.3.x

Spark configurations

spark.gluten.sql.parquet.timestampType.scan.fallback.enabled=true

System information

No response

Relevant logs

No response

The text was updated successfully, but these errors were encountered:

zhztheplayer · 2024-09-03T02:35:13Z

I think the reason of the code was a non-trivial one, see original PR #894. And perhaps we don't need that if-else anymore now as DPP logic was reworked so the expression replacement should be done in earlier phase.

zhouyuan · 2024-09-03T03:19:25Z

it looks like a bug when enabling iceberg reader

Yohahaha · 2024-09-03T07:25:15Z

@liujiayi771 @Zouxxyy

wang-zhun · 2024-09-03T08:18:35Z

I think the reason of the code was a non-trivial one, see original PR #894. And perhaps we don't need that if-else anymore now as DPP logic was reworked so the expression replacement should be done in earlier phase.

@zhztheplayer Thank you for the information provided.

wang-zhun · 2024-09-03T08:24:45Z

it looks like a bug when enabling iceberg reader

@zhouyuan timestamp type is not supported. Under normal circumstances, configuring spark.gluten.sql.parquet.timestampType.scan.fallback.enabled=true can resolve.

liujiayi771 · 2024-09-03T12:05:39Z

#6514 is a similar issue. This is a int64 timestamp, right? We can't get from the data type whether this is an int64 or an int96 timestamp.

wang-zhun added bug Something isn't working triage labels Sep 2, 2024

wang-zhun mentioned this issue Sep 2, 2024

[GLUTEN-7078][CORE] The fallback check for Scan should not be skipped when DPP is present #7080

Merged

zhztheplayer closed this as completed in #7080 Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

wang-zhun commented Sep 2, 2024 •

edited

Loading

zhztheplayer commented Sep 3, 2024

zhouyuan commented Sep 3, 2024

Yohahaha commented Sep 3, 2024

wang-zhun commented Sep 3, 2024

wang-zhun commented Sep 3, 2024

liujiayi771 commented Sep 3, 2024 •

edited

Loading

[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

Comments

wang-zhun commented Sep 2, 2024 • edited Loading

Backend

Bug description

Exception information

Reason

Spark version

Spark configurations

System information

Relevant logs

zhztheplayer commented Sep 3, 2024

zhouyuan commented Sep 3, 2024

Yohahaha commented Sep 3, 2024

wang-zhun commented Sep 3, 2024

wang-zhun commented Sep 3, 2024

liujiayi771 commented Sep 3, 2024 • edited Loading

wang-zhun commented Sep 2, 2024 •

edited

Loading

liujiayi771 commented Sep 3, 2024 •

edited

Loading