Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Native write to hdfs error : UnsupportedOperationException #7441

Open
wenfang6 opened this issue Oct 9, 2024 · 9 comments
Open

[VL] Native write to hdfs error : UnsupportedOperationException #7441

wenfang6 opened this issue Oct 9, 2024 · 9 comments
Labels
bug Something isn't working triage

Comments

@wenfang6
Copy link

wenfang6 commented Oct 9, 2024

Backend

VL (Velox)

Bug description

run sql : insert overwrite table xx partition (ds = 'xx') select * from xx . There is an error message:

org.apache.spark.SparkException: Task failed while writing rows.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:500)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:321)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException
	at org.apache.spark.sql.execution.datasources.FakeRow.isNullAt(FakeRow.scala:36)
	at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:154)
	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:304)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1524)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311)
	... 9 more

table format is parquet . I would like to know if native write is currently supported. for Insertintohivetable.

Spark version

Spark-3.2.x

Spark configurations

spark.gluten.sql.native.writer.enabled=true

System information

No response

Relevant logs

No response

@wenfang6 wenfang6 added bug Something isn't working triage labels Oct 9, 2024
@JkSelf
Copy link
Contributor

JkSelf commented Oct 9, 2024

@wenfang6 We insert a fake row to support native write, but it falls back to the vanilla Spark writer here. It seems that isNativeApplicable is not set correctly. Does your code include this patch? And can you help to provide the reproduced sql? Thanks.

@wenfang6
Copy link
Author

wenfang6 commented Oct 9, 2024

@wenfang6 We insert a fake row to support native write, but it falls back to the vanilla Spark writer here. It seems that isNativeApplicable is not set correctly. Does your code include this patch? And can you help to provide the reproduced sql? Thanks.

simple sql also has this error, like :

insert overwrite  table wen_test_par1 partition (ds = '2024-10-09') 
select * from wen_test;

gluten plan :

== Fallback Summary ==
No fallback nodes

== Physical Plan ==
Execute InsertIntoHiveTable (4)
+- FakeRowAdaptor (3)
   +- ^ NativeScan hive dap_dev.wen_test (1)

we use spark 3.2.1

@JkSelf
Copy link
Contributor

JkSelf commented Oct 10, 2024

@wenfang6 Gluten native writer in spark 321 overwrite vanilla spark HiveFileFormat class. Therefore, you must ensure that the gluten jar is loaded prior to the vanilla spark jar. You can refer the this document to configure. Thanks.

@wenfang6
Copy link
Author

@wenfang6 Gluten native writer in spark 321 overwrite vanilla spark HiveFileFormat class. Therefore, you must ensure that the gluten jar is loaded prior to the vanilla spark jar. You can refer the this document to configure. Thanks.

l try it, but Still haven't use native write. plan like this

== Fallback Summary ==
No fallback nodes

== Physical Plan ==
CommandResult (1)
   +- Execute InsertIntoHiveTable (5)
      +- VeloxColumnarToRowExec (4)
         +- ^ NativeScan hive dap_dev.wen_test (2)

@JkSelf
Copy link
Contributor

JkSelf commented Oct 10, 2024

@wenfang6 Does the above issue is fixed based on this document ? Also native write doesn't support complex type. Does your sql contain complex type?

@zhouyuan zhouyuan changed the title Native write to hdfs error : UnsupportedOperationException [VL] Native write to hdfs error : UnsupportedOperationException Oct 10, 2024
@wenfang6
Copy link
Author

@wenfang6 Does the above issue is fixed based on this document ? Also native write doesn't support complex type. Does your sql contain complex type?

yeah, the above issue is fixed. but haven't use native write. sql doesn't contain complex type.

@JkSelf
Copy link
Contributor

JkSelf commented Oct 10, 2024

@wenfang6 Does this config spark.gluten.sql.native.writer.enabled enabled in your env? The default value is false.

@wenfang6
Copy link
Author

@wenfang6 Does this config spark.gluten.sql.native.writer.enabled enabled in your env? The default value is false.

l set the conf spark.gluten.sql.native.hive.writer.enabled=true

@JkSelf
Copy link
Contributor

JkSelf commented Oct 11, 2024

@wenfang6 Can you add some logging info here to determine why this line is not being executed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants