Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Add schema validation for all operators #6406

Merged
merged 4 commits into from
Jul 11, 2024

Conversation

zhli1142015
Copy link
Contributor

@zhli1142015 zhli1142015 commented Jul 11, 2024

What changes were proposed in this pull request?

This PR fixed the failure below.

val data = "2019-09-09 01:02:03.456789"
val df = Seq(data).toDF("strTs").selectExpr(s"CAST(strTs AS TIMESTAMP_NTZ) AS ts")
df.coalesce(1).write.format("parquet").save("/tmp/test")
 org.apache.gluten.exception.GlutenException: Input schema contains unsupported type when convert row to columnar for StructType(StructField(ts,TimestampNTZType,true)) due to Schema / data type not supported: TimestampNTZType
  at org.apache.gluten.execution.RowToVeloxColumnarExec.$anonfun$doExecuteColumnarInternal$1(RowToVeloxColumnarExec.scala:54)
  at scala.Option.foreach(Option.scala:407)
  at org.apache.gluten.execution.RowToVeloxColumnarExec.doExecuteColumnarInternal(RowToVeloxColumnarExec.scala:51)
  at org.apache.gluten.execution.RowToColumnarExecBase.doExecuteColumnar(RowToColumnarExecBase.scala:62)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
  at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218)
  at org.apache.gluten.execution.ColumnarCoalesceExec.doExecuteColumnar(ColumnarCoalesceExec.scala:46)

How was this patch tested?

UT.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@zhli1142015
Copy link
Contributor Author

Aslo create a separate task to track the support for TimestampNTZType
#6407

@zhli1142015
Copy link
Contributor Author

cc @zhztheplayer and @PHILO-HE , thanks

Copy link

Run Gluten Clickhouse CI

1 similar comment
@zhli1142015
Copy link
Contributor Author

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note this exception happens in RowToColumnar. Looks it will also happen for other Gluten operators following RowToColumnar, other than for Gluten coalesce operator. Right? If so, can we just do the check in doValidate() of GlutenPlan.scala to cover all?

@ulysses-you
Copy link
Contributor

+1 with @PHILO-HE

@zhli1142015
Copy link
Contributor Author

zhli1142015 commented Jul 11, 2024

I thought this is the only missed case, as I saw this check is already added for broadcastExchange, union. But sure, let me move this to GlutenPlan.

Copy link

Run Gluten Clickhouse CI

@zhli1142015 zhli1142015 changed the title [VL] Add validation for coalesce operator [Core] Add schema validation for all operators Jul 11, 2024
@PHILO-HE
Copy link
Contributor

@zhli1142015, then, could you help remove the redundant check for Union, BroadcastExchange, RowToColumnar? With RowToColumnar added prior to GlutenPlan whose schema check passes, we can infer RowToColumnar's same schema check also passes.

Copy link

Run Gluten Clickhouse CI

@zhli1142015
Copy link
Contributor Author

@zhli1142015, then, could you help remove the redundant check for Union, BroadcastExchange, RowToColumnar? With RowToColumnar added prior to GlutenPlan whose schema check passes, we can infer RowToColumnar's same schema check also passes.

Done, thanks.

@zhli1142015 zhli1142015 requested a review from PHILO-HE July 11, 2024 10:57
@zhli1142015 zhli1142015 changed the title [Core] Add schema validation for all operators [VL] Add schema validation for all operators Jul 11, 2024
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Thanks!

@zhli1142015 zhli1142015 merged commit 1cc4ef4 into apache:main Jul 11, 2024
42 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_07_11_2024_time.csv log/native_master_07_10_2024_e8e93e73c_time.csv difference percentage
q1 36.56 33.46 -3.099 91.52%
q2 23.66 28.57 4.902 120.71%
q3 38.89 38.44 -0.456 98.83%
q4 32.48 35.14 2.664 108.20%
q5 70.11 71.06 0.950 101.36%
q6 7.94 6.45 -1.495 81.18%
q7 85.48 83.24 -2.233 97.39%
q8 84.74 88.10 3.353 103.96%
q9 120.17 119.97 -0.203 99.83%
q10 46.28 43.77 -2.508 94.58%
q11 20.26 20.45 0.195 100.96%
q12 23.58 26.84 3.256 113.81%
q13 39.57 38.42 -1.152 97.09%
q14 17.94 20.94 3.001 116.73%
q15 32.84 30.00 -2.846 91.33%
q16 13.99 15.85 1.856 113.27%
q17 104.06 105.14 1.081 101.04%
q18 146.03 145.23 -0.798 99.45%
q19 14.91 14.76 -0.156 98.95%
q20 28.48 26.50 -1.976 93.06%
q21 265.32 267.51 2.186 100.82%
q22 12.26 12.07 -0.186 98.48%
total 1265.54 1271.88 6.336 100.50%

yma11 added a commit to yma11/gluten that referenced this pull request Jul 15, 2024
yma11 added a commit to yma11/gluten that referenced this pull request Jul 15, 2024
yma11 added a commit to yma11/gluten that referenced this pull request Jul 15, 2024
yma11 added a commit to yma11/gluten that referenced this pull request Jul 15, 2024
yma11 added a commit to yma11/gluten that referenced this pull request Jul 15, 2024
yma11 added a commit to yma11/gluten that referenced this pull request Jul 16, 2024
yma11 added a commit to yma11/gluten that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants