Spark-PMoF (Persistent Memory over Fabric), RPMem extension for Spark Shuffle, is a Spark Shuffle Plugin which enables persistent memory and high performance fabric technology like RDMA for Spark shuffle to improve Spark performance in shuffle intensive scneario.
Make sure you got HPNL installed.
git clone https://github.com/Intel-bigdata/Spark-PMoF.git
cd Spark-PMoF; mvn package
This plugin current supports Spark 2.3 and works well on various Network fabrics, including Socket, RDMA and Omni-Path. Before runing Spark workload, add following contents in spark-defaults.conf, then have fun! :-)
spark.driver.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.executor.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.shuffle.manager org.apache.spark.shuffle.pmof.RdmaShuffleManager
Chendi Xue, [email protected] Jian Zhang, [email protected]