Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executors getting created and killed without completing the job using spark-submit with Gluten+Velox JAR #8145

Open
VaibhavFRI opened this issue Dec 4, 2024 · 3 comments
Labels
bug Something isn't working triage

Comments

@VaibhavFRI
Copy link

Backend

VL (Velox)

Bug description

When running a Spark job using the spark-submit command with the provided Gluten+Velox JAR and configuration parameters, the executors are created and subsequently killed without successfully completing the job. The issue occurs when using spark-submit, but the same configuration works as expected when running the job through spark-shell with the same JAR and parameters.
Steps to Reproduce:

Set up the Spark environment with the Gluten+Velox JAR at /pathto/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar.

Run the following spark-submit command:
spark-submit --class --master --conf --jars /path to/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar

configParams is added below in spark configurations

Spark version

Spark-3.5.x

Spark configurations

spark.executor.instances 1
spark.executor.cores 1
spark.task.cpus 1
spark.dynamicAllocation.enabled false
spark.cores.max 1

spark.executor.memory 56g
spark.driver.memory 4g

spark.memory.offHeap.enabled true
spark.memory.offHeap.size 20g
spark.executor.memoryOverhead 1g

spark.driver.extraJavaOptions "--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED"
spark.executor.extraJavaOptions "--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED"

spark.plugins org.apache.gluten.GlutenPlugin
spark.gluten.sql.columnar.forceShuffledHashJoin true
spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager

spark.executor.extraClassPath '/pathto/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar'
spark.driver.extraClassPath '/pathto/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar'

System information

No response

Relevant logs

Caused by: java.lang.ClassNotFoundException: org.apache.spark.shuffle.sort.ColumnarShuffleManager
  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  at java.base/java.lang.Class.forName0(Native Method)
  at java.base/java.lang.Class.forName(Class.java:467)
  at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
  at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
  at org.apache.spark.util.Utils$.classForName(Utils.scala:94)
  at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2548)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:318)
  at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:210)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:478)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  at org.apache.hadoop
@VaibhavFRI VaibhavFRI added bug Something isn't working triage labels Dec 4, 2024
@wForget
Copy link
Member

wForget commented Dec 5, 2024

You can try putting the gluten bundle jar into $SPARK_HOME/jars

@wForget
Copy link
Member

wForget commented Dec 5, 2024

@FelixYBW
Copy link
Contributor

FelixYBW commented Dec 8, 2024

Did you put the jar to hdfs?

You may dump the class load path by "-verbose:class" when run spark-shell and spark-submit, see where it's loaded during spark-shell and why it can't find from spark-submit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants