-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HiveFileFormat has incompatible class error when running TPC-H q18 #3351
Comments
The content points to an anonymous internal class error, I checked the source code and the content of the class Hive is indeed different from the original spark. |
We made some changes for HiveFileFormat and make it overwrite the original spark class. There may have some incompatible issue between spark 331 and 332. Could you try to test with spark 331 to see whether this issue still exists on your side? |
I've pretty much exhausted testing with a combination of CentOS-7, CentOS-8, Spark331, Spark332, and gluten-c7-jar or gluten-c8 -jar before bringing up the issue, and have also eliminated some speculation about jdk version differences(), and there's still no way to get around the problem. |
As a side note, at gluten 0.5.0, I'm still using spark332, but it runs fine. |
@kelvin-qin We introduced native write support for Parquet, ORC, and Hive in version 1.0. To achieve this, we needed to overwrite Spark's default HiveFileFormat. Therefore, when utilizing this functionality, it is important to ensure that HiveFileFormat takes precedence over vanilla Spark loading. To prioritize loading gluten jars over vanilla Spark jars in your environment, you can follow the steps outlined here. Could you please verify whether gluten jars are given higher priority than vanilla Spark jars in your environment? |
It works, thanks so much. @JkSelf |
fine,close |
How is this solved? @JkSelf @kelvin-qin 。 We also meet this problem NATIVE_WRITER_CONF="--conf spark.gluten.sql.native.writer.enabled=true
--conf spark.files=$SPARK_HOME/jars/gluten.jar
--conf spark.driver.extraClassPath=$SPARK_HOME/gluten.jar
--conf spark.executor.extraClassPath=./gluten.jar
--conf spark.driver.userClassPathFirst=true
--conf spark.executor.userClassPathFirst=true
" |
Found the problem. The path configuration spark.driver.extraClassPath is wrong |
@lgbo-ustc Hi, can you post the direct solution under this question? Thank you very much. As I recall it is indeed the relative or absolute path that is causing the problem. |
@kelvin-qin Maybe you can refer this link . The root cause is that we overwrite some spark classes and need to make spark firstly load gluten jar not vanilla spark jars. |
add following configure
|
Backend
VL (Velox)
Bug description
[Expected behavior] :
Run all TPC-H queries normally
[actual behavior]:
Errors are reported when running q18: local class incompatible
Spark version
3.3.2
Spark configurations
No special configurations according to the documentation
System information
centos 8
gcc 8.5.0
openjdk 1.8.0_345
spark 3.3.2
hive 3.1.1
Gluten 1.0.0
Relevant logs
The text was updated successfully, but these errors were encountered: