[VL] Fix arrow dataset csv scan IncompatibleClassChangeError #6785

jinchengchenghh · 2024-08-12T08:12:21Z

- SPARK-37326: Roundtrip in reading and writing TIMESTAMP_NTZ values with custom schema
07:13:20.662 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: WriteFiles[QueryId=1370], due to: Unsupported native write: Only ParquetFileFormat and HiveFileFormat are supported..
07:13:20.662 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Execute InsertIntoHadoopFsRelationCommand[QueryId=1370], due to: at least one of its children has empty output; at least one of its children has empty output.
07:13:20.725 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text [QueryId=1371], due to: Unsupported file format for UnknownFormat..
E20240812 07:13:20.742442 58221 Exceptions.h:67] Line: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1864, Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: INVALID_STATE
07:13:20.747 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text , due to: Unsupported file format for UnknownFormat..
07:13:20.808 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Found schema check failure for StructType(StructField(col0,TimestampNTZType,true)), due to: Schema / data type not supported: TimestampNTZType.
07:13:20.812 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Found schema check failure for StructType(StructField(col0,TimestampNTZType,true)), due to: Schema / data type not supported: TimestampNTZType.
07:13:20.885 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text [QueryId=1374], due to: Unsupported file format for UnknownFormat..
E20240812 07:13:20.902714 57821 Exceptions.h:67] Line: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1864, Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: INVALID_STATE
07:13:20.907 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text , due to: Unsupported file format for UnknownFormat..
07:13:20.937 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text [QueryId=1375], due to: Unsupported file format for UnknownFormat..
E20240812 07:13:20.953720 57821 Exceptions.h:67] Line: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1864, Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: INVALID_STATE
07:13:20.957 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text , due to: Unsupported file format for UnknownFormat..
/__w/incubator-gluten/incubator-gluten/ep/_ep/arrow_ep/java/dataset/src/main/cpp/jni_util.cc:79: Failed to update reservation while freeing bytes: Java Exception: java.lang.IncompatibleClassChangeError

/tmp/jnilib-5403273613459830800.tmp(+0x12195e8)[0x7f49e71935e8]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow4util8ArrowLogD1Ev+0xed)[0x7f49e719376d]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow7dataset3jni31ReservationListenableMemoryPool4FreeEPhll+0x45d)[0x7f49e67884ed]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow10PoolBufferD0Ev+0x47)[0x7f49e741faf7]
/tmp/jnilib-5403273613459830800.tmp(_ZNSt6vectorISt10shared_ptrIN5arrow6BufferEESaIS3_EED1Ev+0x6b)[0x7f49e689d11b]
07:13:20.992 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Unsupported file format for TextReadFormat..
/tmp/jnilib-5403273613459830800.tmp(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow9ArrayDataESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0xda)[0x7f49e69ab0aa]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow17SimpleRecordBatchD1Ev+0x116)[0x7f49e72c95d6]

github-actions · 2024-08-12T08:12:37Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2024-08-12T08:12:53Z

Run Gluten Clickhouse CI

github-actions · 2024-08-13T01:24:34Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-08-13T02:42:38Z

gluten-data/pom.xml

@@ -109,7 +109,7 @@
    <dependency>
      <groupId>org.apache.arrow</groupId>
      <artifactId>arrow-vector</artifactId>
-      <version>${arrow.version}</version>
+      <version>${arrow-gluten.version}</version>
      <exclusions>
        <exclusion>
          <groupId>io.netty</groupId>


Where do the jars come from on CI? As I see only arrow-dataset-gluten.jar was built and uploaded for CI jobs.

jinchengchenghh · 2024-08-13T02:51:27Z

We upload all the arrow jars in CI.https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_docker.yml#L66

The dataset jar is large so we can see the upload process in CI. @zhztheplayer

github-actions · 2024-08-13T03:31:04Z

Run Gluten Clickhouse CI

jinchengchenghh · 2024-08-13T03:33:12Z

Trigger the CI again to find if it can fix this occasionally bug.

jinchengchenghh · 2024-08-13T05:40:03Z

Running test query q24b (iteration 0)...
Executing SQL query from resource path /tpcds-queries/q24b.sql...
W20240813 03:47:42.265614 14334 MemoryAllocator.cpp:199] [MEM] Exceeded memory reservation limit when reserve 1330 new pages when allocate 1330 pages
E20240813 03:47:42.266633 14334 Executor.cpp:31] ThreadPoolExecutor: func threw unhandled gluten::GlutenException: Error during calling Java code from native code: java.lang.IncompatibleClassChangeError
Error running query q24b. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 463.0 failed 1 times, most recent failure: Lost task 0.0 in stage 463.0 (TID 1480) (7c93eb79510f executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR_EXECUTOR] Managed memory leak detected; size = 2097152 bytes, task 0.0 in stage 463.0 (TID 1480)
at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at org.apache.spark.SparkException$.internalError(SparkException.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:630)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:73)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

…6785)

github-actions bot added CORE works for Gluten Core VELOX labels Aug 12, 2024

jinchengchenghh added 2 commits August 12, 2024 15:45

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError

54201c5

Update pom.xml

9de65fd

zhztheplayer reviewed Aug 13, 2024

View reviewed changes

Merge branch 'main' into arrow

bee5aed

zhztheplayer approved these changes Aug 13, 2024

View reviewed changes

jinchengchenghh merged commit 1e5a7c9 into apache:main Aug 13, 2024
43 checks passed

sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError (apache#…

f2823a4

…6785)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError #6785

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError #6785

jinchengchenghh commented Aug 12, 2024

github-actions bot commented Aug 12, 2024

github-actions bot commented Aug 12, 2024

github-actions bot commented Aug 13, 2024

zhztheplayer Aug 13, 2024 •

edited

Loading

jinchengchenghh commented Aug 13, 2024

github-actions bot commented Aug 13, 2024

jinchengchenghh commented Aug 13, 2024

jinchengchenghh commented Aug 13, 2024

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError #6785

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError #6785

Conversation

jinchengchenghh commented Aug 12, 2024

github-actions bot commented Aug 12, 2024

github-actions bot commented Aug 12, 2024

github-actions bot commented Aug 13, 2024

zhztheplayer Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

jinchengchenghh commented Aug 13, 2024

github-actions bot commented Aug 13, 2024

jinchengchenghh commented Aug 13, 2024

jinchengchenghh commented Aug 13, 2024

zhztheplayer Aug 13, 2024 •

edited

Loading