Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

Closed
wang-zhun opened this issue Sep 2, 2024 · 6 comments · Fixed by #7080
Closed

[CORE] The fallback check for Scan should not be skipped when DPP is present #7078

wang-zhun opened this issue Sep 2, 2024 · 6 comments · Fixed by #7080
Labels
bug Something isn't working triage

Comments

@wang-zhun
Copy link
Contributor

wang-zhun commented Sep 2, 2024

Backend

VL (Velox)

Bug description

Exception information

ERROR

Context: Split [Hive: hdfs://xxx/xxx.parquet 4 - 133633304] Task Gluten_Stage_2_TID_2_VTID_2
Additional Context: Operator: TableScan[0] 0
Function: readInt
File: /incubator-gluten/ep/build-velox/build/velox_ep/./velox/dwio/common/IntDecoder.h
Line: 448
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorENS1_22CompileTimeEmptyStringEEEvRKNS1_18VeloxCheckFailArgsET0_

Plan

== Physical Plan ==
AdaptiveSparkPlan (44)
+- == Current Plan ==
   SortMergeJoin Inner (33)
   :- Sort (14)
   :  +- ShuffleQueryStage (13)
   :     +- ColumnarExchange (12)
   :        +- VeloxAppendBatches (11)
   :           +- ^ ProjectExecTransformer (9)
   :              +- ^ InputIteratorTransformer (8)
   :                 +- RowToVeloxColumnar (6)
   :                    +- * Project (5)
   :                       +- * Filter (4)
   :                          +- VeloxColumnarToRowExec (3)
   :                             +- ^ IcebergScanTransformer (1)
   +- Sort (32)
      +- ShuffleQueryStage (31)
         +- ColumnarExchange (30)
            +- VeloxAppendBatches (29)
               +- ^ ProjectExecTransformer (27)
                  +- ^ InputIteratorTransformer (26)
                     +- RowToVeloxColumnar (24)
                        +- * Project (23)
                           +- VeloxColumnarToRowExec (22)
                              +- ^ FilterExecTransformer (20)
                                 +- ^ InputIteratorTransformer (19)
                                    +- RowToVeloxColumnar (17)
                                       +- * ColumnarToRow (16)
                                          +- BatchScan (15)
+- == Initial Plan ==
   SortMergeJoin Inner (43)
   :- Sort (38)
   :  +- Exchange (37)
   :     +- Project (36)
   :        +- Filter (35)
   :           +- BatchScan (34)
   +- Sort (42)
      +- Exchange (41)
         +- Project (40)
            +- Filter (39)
               +- BatchScan (15)

Reason

  • BatchScan (34)RuntimeFilters: [dynamicpruningexpression(true)]
  • Filter (35) : Vanilla filter

There is an unsupported timestamp type in BatchScan (34), which should normally trigger a fallback. However, when these conditions are met, BatchScan (34) skips the fallback check.
skip1
skip2

Spark version

Spark-3.3.x

Spark configurations

spark.gluten.sql.parquet.timestampType.scan.fallback.enabled=true

System information

No response

Relevant logs

No response

@zhztheplayer
Copy link
Member

I think the reason of the code was a non-trivial one, see original PR #894. And perhaps we don't need that if-else anymore now as DPP logic was reworked so the expression replacement should be done in earlier phase.

@zhouyuan
Copy link
Contributor

zhouyuan commented Sep 3, 2024

it looks like a bug when enabling iceberg reader

@Yohahaha
Copy link
Contributor

Yohahaha commented Sep 3, 2024

@liujiayi771 @Zouxxyy

@wang-zhun
Copy link
Contributor Author

I think the reason of the code was a non-trivial one, see original PR #894. And perhaps we don't need that if-else anymore now as DPP logic was reworked so the expression replacement should be done in earlier phase.

@zhztheplayer Thank you for the information provided.

@wang-zhun
Copy link
Contributor Author

it looks like a bug when enabling iceberg reader

@zhouyuan timestamp type is not supported. Under normal circumstances, configuring spark.gluten.sql.parquet.timestampType.scan.fallback.enabled=true can resolve.

@liujiayi771
Copy link
Contributor

liujiayi771 commented Sep 3, 2024

#6514 is a similar issue. This is a int64 timestamp, right? We can't get from the data type whether this is an int64 or an int96 timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
5 participants