Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix arrow dataset csv scan IncompatibleClassChangeError #6785

Merged
merged 3 commits into from
Aug 13, 2024

Conversation

jinchengchenghh
Copy link
Contributor

- SPARK-37326: Roundtrip in reading and writing TIMESTAMP_NTZ values with custom schema
07:13:20.662 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: WriteFiles[QueryId=1370], due to: Unsupported native write: Only ParquetFileFormat and HiveFileFormat are supported..
07:13:20.662 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Execute InsertIntoHadoopFsRelationCommand[QueryId=1370], due to: at least one of its children has empty output; at least one of its children has empty output.
07:13:20.725 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text [QueryId=1371], due to: Unsupported file format for UnknownFormat..
E20240812 07:13:20.742442 58221 Exceptions.h:67] Line: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1864, Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: INVALID_STATE
07:13:20.747 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text , due to: Unsupported file format for UnknownFormat..
07:13:20.808 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Found schema check failure for StructType(StructField(col0,TimestampNTZType,true)), due to: Schema / data type not supported: TimestampNTZType.
07:13:20.812 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Found schema check failure for StructType(StructField(col0,TimestampNTZType,true)), due to: Schema / data type not supported: TimestampNTZType.
07:13:20.885 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text [QueryId=1374], due to: Unsupported file format for UnknownFormat..
E20240812 07:13:20.902714 57821 Exceptions.h:67] Line: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1864, Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: INVALID_STATE
07:13:20.907 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text , due to: Unsupported file format for UnknownFormat..
07:13:20.937 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text [QueryId=1375], due to: Unsupported file format for UnknownFormat..
E20240812 07:13:20.953720 57821 Exceptions.h:67] Line: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:1864, Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: INVALID_STATE
07:13:20.957 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan text , due to: Unsupported file format for UnknownFormat..
/__w/incubator-gluten/incubator-gluten/ep/_ep/arrow_ep/java/dataset/src/main/cpp/jni_util.cc:79: Failed to update reservation while freeing bytes: Java Exception: java.lang.IncompatibleClassChangeError

/tmp/jnilib-5403273613459830800.tmp(+0x12195e8)[0x7f49e71935e8]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow4util8ArrowLogD1Ev+0xed)[0x7f49e719376d]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow7dataset3jni31ReservationListenableMemoryPool4FreeEPhll+0x45d)[0x7f49e67884ed]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow10PoolBufferD0Ev+0x47)[0x7f49e741faf7]
/tmp/jnilib-5403273613459830800.tmp(_ZNSt6vectorISt10shared_ptrIN5arrow6BufferEESaIS3_EED1Ev+0x6b)[0x7f49e689d11b]
07:13:20.992 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Scan csv , due to: Unsupported file format for TextReadFormat..
/tmp/jnilib-5403273613459830800.tmp(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow9ArrayDataESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0xda)[0x7f49e69ab0aa]
/tmp/jnilib-5403273613459830800.tmp(_ZN5arrow17SimpleRecordBatchD1Ev+0x116)[0x7f49e72c95d6]

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Aug 12, 2024
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@@ -109,7 +109,7 @@
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
<version>${arrow.version}</version>
<version>${arrow-gluten.version}</version>
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
Copy link
Member

@zhztheplayer zhztheplayer Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do the jars come from on CI? As I see only arrow-dataset-gluten.jar was built and uploaded for CI jobs.

image

@jinchengchenghh
Copy link
Contributor Author

We upload all the arrow jars in CI.https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_docker.yml#L66

The dataset jar is large so we can see the upload process in CI. @zhztheplayer

Copy link

Run Gluten Clickhouse CI

@jinchengchenghh
Copy link
Contributor Author

Trigger the CI again to find if it can fix this occasionally bug.

@jinchengchenghh
Copy link
Contributor Author

Running test query q24b (iteration 0)...
Executing SQL query from resource path /tpcds-queries/q24b.sql...
W20240813 03:47:42.265614 14334 MemoryAllocator.cpp:199] [MEM] Exceeded memory reservation limit when reserve 1330 new pages when allocate 1330 pages
E20240813 03:47:42.266633 14334 Executor.cpp:31] ThreadPoolExecutor: func threw unhandled gluten::GlutenException: Error during calling Java code from native code: java.lang.IncompatibleClassChangeError
Error running query q24b. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 463.0 failed 1 times, most recent failure: Lost task 0.0 in stage 463.0 (TID 1480) (7c93eb79510f executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR_EXECUTOR] Managed memory leak detected; size = 2097152 bytes, task 0.0 in stage 463.0 (TID 1480)
at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at org.apache.spark.SparkException$.internalError(SparkException.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:630)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:73)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

@jinchengchenghh jinchengchenghh merged commit 1e5a7c9 into apache:main Aug 13, 2024
43 checks passed
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants