Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6571][VL] Add platform and arch subdirectory for base lib package #6942

Merged
merged 7 commits into from
Sep 5, 2024

Conversation

wForget
Copy link
Member

@wForget wForget commented Aug 20, 2024

What changes were proposed in this pull request?

Add platform and arch subdirectory for base lib package.

closes #6571

How was this patch tested?

Manual package and test.

Check gluten bundle jar:

jar -tf gluten-velox-bundle-spark3.5_2.12-centos_7_x86_64-1.3.0-SNAPSHOT.jar | grep -E "*.so$"

# output
org/apache/gluten/linux/amd64/libgluten.so
org/apache/gluten/linux/amd64/libvelox.so
......

Successfully load native libs:

24/08/21 12:44:53 INFO JniLibLoader: Successfully loaded library org/apache/gluten/linux/amd64/libgluten.so
24/08/21 12:44:54 INFO JniLibLoader: Successfully loaded library org/apache/gluten/linux/amd64/libvelox.so

Successfully execute tpcds q3.

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Aug 20, 2024
Copy link

#6571

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good though does this work with 3rd libs?

https://github.com/apache/incubator-gluten/blob/11a236dd86f2f25777f2df9db9f1fef4f71e97f7/dev/build-thirdparty.sh#L85C1-L85C62

It's likely the libs are already packaged into specific jars. So I assume we don't have to do anything for that?

cc @PHILO-HE

@wForget
Copy link
Member Author

wForget commented Aug 21, 2024

Looking good though does this work with 3rd libs?

With vpkg packaging, 3rd libs seem to be no longer needed? Do we still need to handle 3rd libs in this way?

Copy link

Run Gluten Clickhouse CI

@wForget wForget marked this pull request as ready for review August 21, 2024 04:55
@zhztheplayer
Copy link
Member

Looking good though does this work with 3rd libs?

With vpkg packaging, 3rd libs seem to be no longer needed? Do we still need to handle 3rd libs in this way?

I personally never used that feature. @PHILO-HE Do you know some? It would sound great if we can remove, as for long term we will completely drop dynamic build support.

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

Run Gluten Clickhouse CI

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Aug 21, 2024

Looking good though does this work with 3rd libs?

With vpkg packaging, 3rd libs seem to be no longer needed? Do we still need to handle 3rd libs in this way?

I personally never used that feature. @PHILO-HE Do you know some? It would sound great if we can remove, as for long term we will completely drop dynamic build support.

Yes, third-party lib package will never be used if gluten is built by vcpkg.
According to the feedback from some users, they are still using third-party package which is deployed together with gluten libs produced by dynamic linking. But we recommend all users to migrate to vcpkg build. I think, if not very necessary, let's do not change the code related to third-party package.

Copy link

Run Gluten Clickhouse CI

@wForget
Copy link
Member Author

wForget commented Aug 21, 2024

@zhztheplayer @PHILO-HE Could you please help to see if the failed GA is a relevant failure?

16:32:19  - test cache file command *** FAILED ***
16:32:19    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4) (gluten-gluten-ci-11653-llqgx-65q8j-rm2bs executor driver): org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Open file(hdfs://127.0.0.1:8020/tpch-data/lineitem/part-00000-d08071cb-0dfa-42dc-9198-83cb334ccda3-c000.snappy.parquet) failed. IOError: Code: 107. DB::ErrnoException: Cannot open file /tmp/gluten_hdfs_cache//4654/6d2/6d25666d180cc924b04f3234d772995f/0: , errno: 2, strerror: No such file or directory: Cache info: Buffer path: tpch-data/lineitem/part-00000-d08071cb-0dfa-42dc-9198-83cb334ccda3-c000.snappy.parquet, hash key: 6d25666d180cc924b04f3234d772995f, file_offset_of_buffer_end: 19164575, read_until_position: 19230111, internal buffer end: None, read_type: CACHED, last caller: 215fdc3e-15d9-4926-b9ad-9201b2f74587:57933, file segment info: File segment: [0, 19230110], key: 6d25666d180cc924b04f3234d772995f, state: DOWNLOADED, downloaded size: 19230111, reserved size: 19230111, downloader id: None, current write offset: 19230111, caller id: 215fdc3e-15d9-4926-b9ad-9201b2f74587:57933, kind: Regular, unbound: 0. (FILE_DOESNT_EXIST) (version 24.9.1.1): While executing SubstraitFileSource
16:32:19  0. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Common/Exception.cpp:111: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000bc09edb
16:32:19  1. ../ClickHouse/src/Common/Exception.h:110: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000064a3f2c
16:32:19  2. ../ClickHouse/src/Common/Exception.h:128: DB::Exception::Exception<String const&, String>(int, FormatStringHelperImpl<std::type_identity<String const&>::type, std::type_identity<String>::type>, String const&, String&&) @ 0x00000000064c036b
16:32:19  3. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/Storages/SubstraitSource/ParquetFormatFile.cpp:146: local_engine::ParquetFormatFile::collectRequiredRowGroups(DB::ReadBuffer*, int&) const @ 0x000000000c30ee5f
16:32:19  4. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/Storages/SubstraitSource/ParquetFormatFile.cpp:70: local_engine::ParquetFormatFile::createInputFormat(DB::Block const&) @ 0x000000000c30d3e6
16:32:19  5. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/Storages/SubstraitSource/SubstraitFileSource.cpp:345: local_engine::NormalFileReader::NormalFileReader(std::shared_ptr<local_engine::FormatFile> const&, std::shared_ptr<DB::Context const> const&, DB::Block const&, DB::Block const&) @ 0x000000000c3059f2
16:32:19  6. ../ClickHouse/contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:714: local_engine::SubstraitFileSource::generate() @ 0x000000000c303049
16:32:19  7. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/ISource.cpp:139: DB::ISource::tryGenerate() @ 0x000000001013ac5b
16:32:19  8. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/ISource.cpp:108: DB::ISource::work() @ 0x000000001013a967
16:32:19  9. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/Executors/ExecutionThreadContext.cpp:47: DB::ExecutionThreadContext::executeTask() @ 0x00000000101550c9
16:32:19  10. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/Executors/PipelineExecutor.cpp:279: DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000010148570
16:32:19  11. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/Executors/PipelineExecutor.cpp:153: DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000010147f48
16:32:19  12. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/Executors/PullingPipelineExecutor.cpp:54: DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x000000001015ba77
16:32:19  13. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Processors/Executors/PullingPipelineExecutor.cpp:65: DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x000000001015bc99
16:32:19  14. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/Parser/SerializedPlanParser.cpp:1613: local_engine::LocalExecutor::hasNext() @ 0x000000000c030e45
16:32:19  15. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/local_engine_jni.cpp:277: Java_org_apache_gluten_vectorized_BatchIterator_nativeHasNext @ 0x0000000006487e77
16:32:19  
16:32:19  0. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../ClickHouse/src/Common/Exception.cpp:111: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000bc09edb
16:32:19  1. ../ClickHouse/src/Common/Exception.h:109: DB::Exception::createRuntime(int, String&) @ 0x00000000064a042c
16:32:19  2. ../local-engine/jni/jni_common.h:98: unsigned char local_engine::safeCallBooleanMethod<>(JNIEnv_*, _jobject*, _jmethodID*) @ 0x00000000064a0f5f
16:32:19  3. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/Shuffle/NativeSplitter.cpp:161: local_engine::NativeSplitter::hasNext() @ 0x000000000c12c1e5
16:32:19  4. /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/build/../local-engine/local_engine_jni.cpp:1147: Java_org_apache_gluten_vectorized_BlockSplitIterator_nativeHasNext @ 0x000000000649ab37
16:32:19  
16:32:19  	at org.apache.gluten.vectorized.BlockSplitIterator.nativeHasNext(Native Method)
16:32:19  	at org.apache.gluten.vectorized.BlockSplitIterator.hasNext(BlockSplitIterator.java:55)
16:32:19  	at org.apache.spark.sql.execution.utils.CHExecUtil$$anon$2.hasNext(CHExecUtil.scala:165)
16:32:19  	at org.apache.gluten.vectorized.CloseablePartitionedBlockIterator.hasNext(CloseablePartitionedBlockIterator.scala:33)
16:32:19  	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
16:32:19  	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
16:32:19  	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
16:32:19  	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
16:32:19  	at org.apache.spark.scheduler.Task.run(Task.scala:136)
16:32:19  	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
16:32:19  	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
16:32:19  	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
16:32:19  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
16:32:19  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
16:32:19  	at java.lang.Thread.run(Thread.java:748)

Copy link

Run Gluten Clickhouse CI

@wForget
Copy link
Member Author

wForget commented Aug 22, 2024

Could you please help to see if the failed GA is a relevant failure?

It doesn't seem to be a related failure. I retriggered GA and the previously failed task passed.

@wForget wForget requested a review from zhztheplayer August 22, 2024 05:33
Copy link

github-actions bot commented Sep 4, 2024

Run Gluten Clickhouse CI

@wForget
Copy link
Member Author

wForget commented Sep 4, 2024

@zhztheplayer @PHILO-HE Could you please continue looking at it?

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall looking good.

BTW, is it possible to have a unit test to check on Gluten's behavior when os platform/arch is not supported by Jar? Perhaps we can raise an error like ... not supported by current Gluten build in that case?

zhztheplayer
zhztheplayer previously approved these changes Sep 4, 2024
@@ -158,11 +158,10 @@ public static void unloadFromPath(String libPath) {
}
}

public void mapAndLoad(String unmappedLibName, boolean requireUnload) {
public void mapAndLoad(String unmappedLibPath, boolean requireUnload) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the method now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the method now?

It seems possible, I will remove it.

@wForget
Copy link
Member Author

wForget commented Sep 4, 2024

BTW, is it possible to have a unit test to check on Gluten's behavior when os platform/arch is not supported by Jar? Perhaps we can raise an error like ... not supported by current Gluten build in that case?

I will try to add a checker in buildbundle-veloxbe.sh to check if it is a supported platform and arch.

Copy link

github-actions bot commented Sep 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 5, 2024

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer merged commit d289b54 into apache:main Sep 5, 2024
42 of 43 checks passed
zhztheplayer added a commit to zhztheplayer/gluten that referenced this pull request Sep 5, 2024
zhztheplayer added a commit that referenced this pull request Sep 6, 2024
dcoliversun pushed a commit to dcoliversun/gluten that referenced this pull request Sep 11, 2024
dcoliversun pushed a commit to dcoliversun/gluten that referenced this pull request Sep 11, 2024
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUILD CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Add an arch subdirectory for native libs package
3 participants