Skip to content

Commit

Permalink
Fix the pyspark driver log level (#269)
Browse files Browse the repository at this point in the history
* add alf4j-log4j jar to class path

* Update doc
  • Loading branch information
carsonwang authored Sep 8, 2022
1 parent 1fdd6c2 commit 2df6b77
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
3 changes: 3 additions & 0 deletions doc/spark_on_ray.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,6 @@ spark = raydp.init_spark(...,enable_hive=True)
spark.sql("select * from db.xxx").show()
```

### Logging
+ Driver Log: By default, the spark driver log level is WARN. After getting a Spark session by running `spark = raydp.init_spark`, you can change the log level for example `spark.sparkContext.setLogLevel("INFO")`. You will also see some AppMaster INFO logs on the driver. This is because Ray redirects the actor logs to driver by default. To disable logging to driver, you can set it in Ray init `ray.init(log_to_driver=False)`
+ Executor Log: The spark executor logs are stored in Ray's logging directory. By default they are available at /tmp/ray/session_\*/logs/java-worker-\*.log
7 changes: 6 additions & 1 deletion python/raydp/spark/ray_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import ray
import ray._private.services
import pyspark
from pyspark.sql.session import SparkSession

from raydp.services import Cluster
Expand Down Expand Up @@ -69,13 +70,17 @@ def get_spark_session(self,
extra_conf["spark.driver.bindAddress"] = str(driver_node_ip)
RAYDP_CP = os.path.abspath(os.path.join(os.path.abspath(__file__), "../../jars/*"))
RAY_CP = os.path.abspath(os.path.join(os.path.dirname(ray.__file__), "jars/*"))
# A workaround for the spark driver to bind to its slf4j-log4j jar instead of the one from
# Ray's jar. Without this, the driver produces INFO logs and the level cannot be changed.
spark_home = os.environ.get("SPARK_HOME", os.path.dirname(pyspark.__file__))
log4j_path = os.path.abspath(os.path.join(spark_home, "jars/slf4j-log4j*.jar"))
try:
extra_jars = [extra_conf["spark.jars"]]
except KeyError:
extra_jars = []
extra_conf["spark.jars"] = ",".join(glob.glob(RAYDP_CP) + extra_jars)
driver_cp_key = "spark.driver.extraClassPath"
driver_cp = ":".join(glob.glob(RAYDP_CP) + glob.glob(RAY_CP))
driver_cp = ":".join(glob.glob(log4j_path) + glob.glob(RAYDP_CP) + glob.glob(RAY_CP))
if driver_cp_key in extra_conf:
extra_conf[driver_cp_key] = driver_cp + ":" + extra_conf[driver_cp_key]
else:
Expand Down

0 comments on commit 2df6b77

Please sign in to comment.