Executors getting created and killed without completing the job using spark-submit with Gluten+Velox JAR #8145

VaibhavFRI · 2024-12-04T10:52:16Z

Backend

VL (Velox)

Bug description

When running a Spark job using the spark-submit command with the provided Gluten+Velox JAR and configuration parameters, the executors are created and subsequently killed without successfully completing the job. The issue occurs when using spark-submit, but the same configuration works as expected when running the job through spark-shell with the same JAR and parameters.
Steps to Reproduce:

Set up the Spark environment with the Gluten+Velox JAR at /pathto/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar.

Run the following spark-submit command:
spark-submit --class --master --conf --jars /path to/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar

configParams is added below in spark configurations

Spark version

Spark-3.5.x

Spark configurations

spark.executor.instances 1
spark.executor.cores 1
spark.task.cpus 1
spark.dynamicAllocation.enabled false
spark.cores.max 1

spark.executor.memory 56g
spark.driver.memory 4g

spark.memory.offHeap.enabled true
spark.memory.offHeap.size 20g
spark.executor.memoryOverhead 1g

spark.driver.extraJavaOptions "--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED"
spark.executor.extraJavaOptions "--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true --add-opens java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED"

spark.plugins org.apache.gluten.GlutenPlugin
spark.gluten.sql.columnar.forceShuffledHashJoin true
spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager

spark.executor.extraClassPath '/pathto/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar'
spark.driver.extraClassPath '/pathto/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar'

System information

No response

Relevant logs

Caused by: java.lang.ClassNotFoundException: org.apache.spark.shuffle.sort.ColumnarShuffleManager
  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  at java.base/java.lang.Class.forName0(Native Method)
  at java.base/java.lang.Class.forName(Class.java:467)
  at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
  at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
  at org.apache.spark.util.Utils$.classForName(Utils.scala:94)
  at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2548)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:318)
  at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:210)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:478)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  at org.apache.hadoop

wForget · 2024-12-05T03:03:12Z

You can try putting the gluten bundle jar into $SPARK_HOME/jars

wForget · 2024-12-05T03:11:43Z

FYI: https://issues.apache.org/jira/browse/SPARK-45762

FelixYBW · 2024-12-08T20:20:51Z

Did you put the jar to hdfs?

You may dump the class load path by "-verbose:class" when run spark-shell and spark-submit, see where it's loaded during spark-shell and why it can't find from spark-submit.

VaibhavFRI added bug Something isn't working triage labels Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executors getting created and killed without completing the job using spark-submit with Gluten+Velox JAR #8145

Executors getting created and killed without completing the job using spark-submit with Gluten+Velox JAR #8145

VaibhavFRI commented Dec 4, 2024

wForget commented Dec 5, 2024

wForget commented Dec 5, 2024

FelixYBW commented Dec 8, 2024

Executors getting created and killed without completing the job using spark-submit with Gluten+Velox JAR #8145

Executors getting created and killed without completing the job using spark-submit with Gluten+Velox JAR #8145

Comments

VaibhavFRI commented Dec 4, 2024

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

wForget commented Dec 5, 2024

wForget commented Dec 5, 2024

FelixYBW commented Dec 8, 2024