Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Use sc.listFiles instead of addedFiles.keys #6175

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

The first level of sc.addedFiles changed to session, so this pr changes to use sc.listFiles to be compatible with Spark3.5 and later.

How was this patch tested?

To be compatible with Spark3.5 and later

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@ulysses-you ulysses-you changed the title [CORE] Use sc.listFiles instead addedFiles.keys [CORE] Use sc.listFiles instead of addedFiles.keys Jun 21, 2024
@JkSelf
Copy link
Contributor

JkSelf commented Jun 24, 2024

Thanks @ulysses-you. This PR can fix the following exception in spark 3.5.

24/06/21 23:40:39 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to default
java.nio.file.NoSuchFileException: default
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
        at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
        at java.nio.file.Files.copy(Files.java:1274)
        at org.apache.spark.util.Utils$.copyRecursive(Utils.scala:681)
        at org.apache.spark.util.Utils$.copyFile(Utils.scala:652)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:725)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:454)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$5(Executor.scala:1136)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$5$adapted(Executor.scala:1133)
        at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:193)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at org.apache.spark.executor.Executor.updateDependencies(Executor.scala:1133)
        at org.apache.spark.executor.Executor.<init>(Executor.scala:330)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:181)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@JkSelf JkSelf merged commit 9cceba6 into apache:main Jun 24, 2024
41 checks passed
@ulysses-you ulysses-you deleted the 3.5 branch June 24, 2024 01:18
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6175_time.csv log/native_master_06_23_2024_1e06169cd_time.csv difference percentage
q1 34.61 36.88 2.268 106.55%
q2 23.30 23.65 0.350 101.50%
q3 39.10 42.16 3.067 107.85%
q4 33.91 33.05 -0.856 97.47%
q5 72.70 69.90 -2.796 96.15%
q6 7.61 7.95 0.339 104.46%
q7 83.94 80.73 -3.215 96.17%
q8 85.21 82.15 -3.056 96.41%
q9 122.08 119.99 -2.092 98.29%
q10 45.84 46.07 0.226 100.49%
q11 21.23 20.48 -0.749 96.47%
q12 25.94 26.53 0.597 102.30%
q13 39.29 38.12 -1.164 97.04%
q14 20.02 18.74 -1.279 93.61%
q15 33.39 32.95 -0.445 98.67%
q16 14.25 14.22 -0.029 99.80%
q17 103.28 104.72 1.439 101.39%
q18 146.95 148.08 1.134 100.77%
q19 13.82 14.64 0.822 105.95%
q20 29.27 30.40 1.127 103.85%
q21 266.18 261.83 -4.357 98.36%
q22 14.29 12.33 -1.955 86.32%
total 1276.20 1265.58 -10.623 99.17%

deepashreeraghu pushed a commit to deepashreeraghu/incubator-gluten that referenced this pull request Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants