-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1743: Upgrade Spark to 4.0.0-preview1 #1909
Conversation
java/bench/spark/pom.xml
Outdated
<exclude>META-INF/DUMMY.DSA</exclude> | ||
<exclude>META-INF/*.SF</exclude> | ||
<exclude>META-INF/*.DSA</exclude> | ||
<exclude>META-INF/*.RSA</exclude> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, Spark has fixed this problem by upgrading the arrow-vector
version, so there is no modification here.
[SPARK-47981][BUILD] Upgrade Arrow
to 16.0.0
@@ -74,7 +74,7 @@ | |||
@BenchmarkMode(Mode.AverageTime) | |||
@OutputTimeUnit(TimeUnit.MICROSECONDS) | |||
@AutoService(OrcBenchmark.class) | |||
@Fork(jvmArgsAppend = "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED") | |||
@Fork(jvmArgsAppend = {"--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caused by: java.lang.IllegalAccessException: symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo, from interface org.apache.spark.sql.catalyst.util.SparkDateTimeUtils (unnamed module @2b71fc7e)
at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
at java.base/java.lang.invoke.MethodHandles$Lookup.checkSymbolicClass(MethodHandles.java:3686)
at java.base/java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3646)
at java.base/java.lang.invoke.MethodHandles$Lookup.findVirtual(MethodHandles.java:2680)
at org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.$init$(SparkDateTimeUtils.scala:206)
at org.apache.spark.sql.catalyst.util.DateTimeUtils$.<clinit>(DateTimeUtils.scala:41)
java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for testing this. 😄
I'd recommend to create a JIRA for migration to Scala 2.13 of Apache Spark 3.5.1 first. :)
…mark ### What changes were proposed in this pull request? This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark. ### Why are the changes needed? #1909 (review) ### How was this patch tested? local test ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ``` Benchmark (compression) (dataset) (format) Mode Cnt Score Error Units SparkBenchmark.partialRead zstd taxi parquet avgt 5 17211.731 ± 11836.315 us/op SparkBenchmark.partialRead:bytesPerRecord zstd taxi parquet avgt 5 0.002 # SparkBenchmark.partialRead:ops zstd taxi parquet avgt 5 10.000 # SparkBenchmark.partialRead:perRecord zstd taxi parquet avgt 5 0.001 ± 0.001 us/op SparkBenchmark.partialRead:records zstd taxi parquet avgt 5 113791180.000 # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #1912 from cxzl25/ORC-1704. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…mark ### What changes were proposed in this pull request? This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark. ### Why are the changes needed? #1909 (review) ### How was this patch tested? local test ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ``` Benchmark (compression) (dataset) (format) Mode Cnt Score Error Units SparkBenchmark.partialRead zstd taxi parquet avgt 5 17211.731 ± 11836.315 us/op SparkBenchmark.partialRead:bytesPerRecord zstd taxi parquet avgt 5 0.002 # SparkBenchmark.partialRead:ops zstd taxi parquet avgt 5 10.000 # SparkBenchmark.partialRead:perRecord zstd taxi parquet avgt 5 0.001 ± 0.001 us/op SparkBenchmark.partialRead:records zstd taxi parquet avgt 5 113791180.000 # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #1912 from cxzl25/ORC-1704. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit dc634cb) Signed-off-by: Dongjoon Hyun <[email protected]>
Hi, @cxzl25 . Sorry for asking this, but could you rebase this PR once more? |
Thank you! |
java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java
Outdated
Show resolved
Hide resolved
…nchmark runs on JDK17 ### What changes were proposed in this pull request? This PR aims to fix `sun.util.calendar` IllegalAccessException when SparkBenchmark runs on JDK17. ### Why are the changes needed? #1909 (comment) ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1919 from cxzl25/ORC-1707. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…nchmark runs on JDK17 ### What changes were proposed in this pull request? This PR aims to fix `sun.util.calendar` IllegalAccessException when SparkBenchmark runs on JDK17. ### Why are the changes needed? #1909 (comment) ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1919 from cxzl25/ORC-1707. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5bb2346) Signed-off-by: Dongjoon Hyun <[email protected]>
If there is no further issue, shall we finalize this PR and merge with |
I did some validation testing locally and I think the PR is ready to be merged. |
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Merged to main/2.0.
### What changes were proposed in this pull request? This PR aims to upgrade the benchmark module to use Spark 4.0.0-preview1. ### Why are the changes needed? ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1909 from cxzl25/support_spark_4. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit bcb25fa) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to upgrade the benchmark module to use Spark 4.0.0-preview1.
Why are the changes needed?
How was this patch tested?
GA
Was this patch authored or co-authored using generative AI tooling?
No