ORC-1743: Upgrade Spark to 4.0.0-preview1 #1909

cxzl25 · 2024-04-24T08:42:45Z

What changes were proposed in this pull request?

This PR aims to upgrade the benchmark module to use Spark 4.0.0-preview1.

Why are the changes needed?

How was this patch tested?

GA

Was this patch authored or co-authored using generative AI tooling?

No

cxzl25 · 2024-04-24T08:45:01Z

java/bench/spark/pom.xml

-                    <exclude>META-INF/DUMMY.DSA</exclude>
+                    <exclude>META-INF/*.SF</exclude>
+                    <exclude>META-INF/*.DSA</exclude>
+                    <exclude>META-INF/*.RSA</exclude>


[WARNING] eclipse-collections-11.1.0.jar, eclipse-collections-api-11.1.0.jar define 4 overlapping resources: [WARNING] - LICENSE-EDL-1.0.txt [WARNING] - LICENSE-EPL-1.0.txt [WARNING] - META-INF/ECLIPSE_.RSA [WARNING] - META-INF/ECLIPSE_.SF

Currently, Spark has fixed this problem by upgrading the arrow-vector version, so there is no modification here.

[SPARK-47981][BUILD] Upgrade Arrow to 16.0.0

cxzl25 · 2024-04-24T08:45:26Z

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java

@@ -74,7 +74,7 @@
 @BenchmarkMode(Mode.AverageTime)
 @OutputTimeUnit(TimeUnit.MICROSECONDS)
 @AutoService(OrcBenchmark.class)
-@Fork(jvmArgsAppend = "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED")
+@Fork(jvmArgsAppend = {"--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED"})


Caused by: java.lang.IllegalAccessException: symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo, from interface org.apache.spark.sql.catalyst.util.SparkDateTimeUtils (unnamed module @2b71fc7e) at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) at java.base/java.lang.invoke.MethodHandles$Lookup.checkSymbolicClass(MethodHandles.java:3686) at java.base/java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3646) at java.base/java.lang.invoke.MethodHandles$Lookup.findVirtual(MethodHandles.java:2680) at org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.$init$(SparkDateTimeUtils.scala:206) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.<clinit>(DateTimeUtils.scala:41)

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java

dongjoon-hyun

Thank you for testing this. 😄

I'd recommend to create a JIRA for migration to Scala 2.13 of Apache Spark 3.5.1 first. :)

…mark ### What changes were proposed in this pull request? This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark. ### Why are the changes needed? #1909 (review) ### How was this patch tested? local test ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ``` Benchmark (compression) (dataset) (format) Mode Cnt Score Error Units SparkBenchmark.partialRead zstd taxi parquet avgt 5 17211.731 ± 11836.315 us/op SparkBenchmark.partialRead:bytesPerRecord zstd taxi parquet avgt 5 0.002 # SparkBenchmark.partialRead:ops zstd taxi parquet avgt 5 10.000 # SparkBenchmark.partialRead:perRecord zstd taxi parquet avgt 5 0.001 ± 0.001 us/op SparkBenchmark.partialRead:records zstd taxi parquet avgt 5 113791180.000 # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #1912 from cxzl25/ORC-1704. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…mark ### What changes were proposed in this pull request? This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark. ### Why are the changes needed? #1909 (review) ### How was this patch tested? local test ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ``` Benchmark (compression) (dataset) (format) Mode Cnt Score Error Units SparkBenchmark.partialRead zstd taxi parquet avgt 5 17211.731 ± 11836.315 us/op SparkBenchmark.partialRead:bytesPerRecord zstd taxi parquet avgt 5 0.002 # SparkBenchmark.partialRead:ops zstd taxi parquet avgt 5 10.000 # SparkBenchmark.partialRead:perRecord zstd taxi parquet avgt 5 0.001 ± 0.001 us/op SparkBenchmark.partialRead:records zstd taxi parquet avgt 5 113791180.000 # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #1912 from cxzl25/ORC-1704. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit dc634cb) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2024-04-30T17:23:23Z

Hi, @cxzl25 . Sorry for asking this, but could you rebase this PR once more?

dongjoon-hyun · 2024-05-01T04:55:08Z

Thank you!

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java

…nchmark runs on JDK17 ### What changes were proposed in this pull request? This PR aims to fix `sun.util.calendar` IllegalAccessException when SparkBenchmark runs on JDK17. ### Why are the changes needed? #1909 (comment) ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1919 from cxzl25/ORC-1707. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…nchmark runs on JDK17 ### What changes were proposed in this pull request? This PR aims to fix `sun.util.calendar` IllegalAccessException when SparkBenchmark runs on JDK17. ### Why are the changes needed? #1909 (comment) ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1919 from cxzl25/ORC-1707. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5bb2346) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2024-07-11T17:27:13Z

If there is no further issue, shall we finalize this PR and merge with 4.0.0-preview1, @cxzl25 ? Please let me know if this is a Draft for some blocker reasons still.

cxzl25 · 2024-07-12T04:35:22Z

If there is no further issue, shall we finalize this PR and merge with 4.0.0-preview1

I did some validation testing locally and I think the PR is ready to be merged.

dongjoon-hyun · 2024-07-12T16:30:03Z

Thank you!

dongjoon-hyun

+1, LGTM.
Merged to main/2.0.

### What changes were proposed in this pull request? This PR aims to upgrade the benchmark module to use Spark 4.0.0-preview1. ### Why are the changes needed? ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1909 from cxzl25/support_spark_4. Authored-by: sychen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit bcb25fa) Signed-off-by: Dongjoon Hyun <[email protected]>

cxzl25 marked this pull request as draft April 24, 2024 08:42

github-actions bot added BUILD JAVA labels Apr 24, 2024

cxzl25 commented Apr 24, 2024

View reviewed changes

cxzl25 mentioned this pull request Apr 24, 2024

ORC-1700: Write parquet decimal type data in Benchmark using FIXED_LEN_BYTE_ARRAY type #1910

Closed

dongjoon-hyun reviewed Apr 24, 2024

View reviewed changes

cxzl25 mentioned this pull request Apr 25, 2024

ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark #1912

Closed

cxzl25 force-pushed the support_spark_4 branch from ad9e79a to 68f02f7 Compare April 25, 2024 04:22

cxzl25 force-pushed the support_spark_4 branch from 8afc779 to a988573 Compare May 1, 2024 02:12

dongjoon-hyun reviewed May 1, 2024

View reviewed changes

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java Outdated Show resolved Hide resolved

cxzl25 mentioned this pull request May 1, 2024

ORC-1707: Fix sun.util.calendar IllegalAccessException when SparkBenchmark runs on JDK17 #1919

Closed

cxzl25 force-pushed the support_spark_4 branch from eff9a57 to 590c8f3 Compare May 1, 2024 16:08

cxzl25 mentioned this pull request Jun 3, 2024

Bump spark.version from 3.5.1 to 4.0.0-preview1 in /java #1951

Closed

cxzl25 force-pushed the support_spark_4 branch from d1d9f38 to 7b3cdb0 Compare June 3, 2024 13:04

dongjoon-hyun changed the title ~~Test Spark 4.0.0-SNAPSHOT~~ Test Spark 4.0.0-preview1 Jul 11, 2024

cxzl25 changed the title ~~Test Spark 4.0.0-preview1~~ ORC-1743: Upgrade Spark to 4.0.0-preview1 Jul 12, 2024

cxzl25 marked this pull request as ready for review July 12, 2024 03:09

cxzl25 added 6 commits July 12, 2024 11:12

test spark 4.0.0-snapshot

543a0c5

sytle

df9f93b

META-INF

9249633

trigger test

6ef49df

scala 2.13.14 [SPARK-48049]

9c057ff

4.0.0-preview1

dd74bf5

cxzl25 force-pushed the support_spark_4 branch from 7b3cdb0 to dd74bf5 Compare July 12, 2024 03:17

dongjoon-hyun added this to the 2.0.2 milestone Jul 12, 2024

dongjoon-hyun approved these changes Jul 12, 2024

View reviewed changes

dongjoon-hyun closed this in bcb25fa Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORC-1743: Upgrade Spark to 4.0.0-preview1 #1909

ORC-1743: Upgrade Spark to 4.0.0-preview1 #1909

cxzl25 commented Apr 24, 2024 •

edited

Loading

cxzl25 Apr 24, 2024

cxzl25 Apr 26, 2024

cxzl25 Apr 24, 2024

dongjoon-hyun left a comment

dongjoon-hyun commented Apr 30, 2024

dongjoon-hyun commented May 1, 2024

dongjoon-hyun commented Jul 11, 2024 •

edited

Loading

cxzl25 commented Jul 12, 2024

dongjoon-hyun commented Jul 12, 2024

dongjoon-hyun left a comment

ORC-1743: Upgrade Spark to 4.0.0-preview1 #1909

ORC-1743: Upgrade Spark to 4.0.0-preview1 #1909

Conversation

cxzl25 commented Apr 24, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

cxzl25 Apr 24, 2024

Choose a reason for hiding this comment

cxzl25 Apr 26, 2024

Choose a reason for hiding this comment

cxzl25 Apr 24, 2024

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Apr 30, 2024

dongjoon-hyun commented May 1, 2024

dongjoon-hyun commented Jul 11, 2024 • edited Loading

cxzl25 commented Jul 12, 2024

dongjoon-hyun commented Jul 12, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

cxzl25 commented Apr 24, 2024 •

edited

Loading

dongjoon-hyun commented Jul 11, 2024 •

edited

Loading