diff --git a/docs/get-started/ClickHouse.md b/docs/get-started/ClickHouse.md index ab24de7a4fd6..38ce048fe0de 100644 --- a/docs/get-started/ClickHouse.md +++ b/docs/get-started/ClickHouse.md @@ -629,19 +629,26 @@ public read-only account:gluten/hN2xX3uQ4m ### Celeborn support -Gluten with clickhouse backend has not yet supportted [Celeborn](https://github.com/apache/celeborn) natively as remote shuffle service using columar shuffle. However, you can still use Celeborn with row shuffle, which means a ColumarBatch will be converted to a row during shuffle. -Below introduction is used to enable this feature: +Gluten with clickhouse backend supports [Celeborn](https://github.com/apache/celeborn) as remote shuffle service. Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`. + +Below introduction is used to enable this feature. First refer to this URL(https://github.com/apache/celeborn) to setup a celeborn cluster. +When compiling the Gluten Java module, it's required to enable `celeborn` profile, as follows: + +``` +mvn clean package -Pbackends-clickhouse -Pspark-3.3 -Pceleborn -DskipTests +``` + Then add the Spark Celeborn Client packages to your Spark application's classpath(usually add them into `$SPARK_HOME/jars`). - Celeborn: celeborn-client-spark-3-shaded_2.12-[celebornVersion].jar -Currently to use Celeborn following configurations are required in `spark-defaults.conf` +Currently to use Gluten following configurations are required in `spark-defaults.conf` ``` -spark.shuffle.manager org.apache.spark.shuffle.celeborn.SparkShuffleManager +spark.shuffle.manager org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager # celeborn master spark.celeborn.master.endpoints clb-master:9097 @@ -670,24 +677,6 @@ spark.celeborn.storage.hdfs.dir hdfs:///celeborn spark.dynamicAllocation.enabled false ``` -#### Celeborn Columnar Shuffle Support -Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`. -The native Celeborn support can be enabled by the following configuration -``` -spark.shuffle.manager=org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager -``` - -quickly start a celeborn cluster -```shell -wget https://archive.apache.org/dist/celeborn/celeborn-0.3.2-incubating/apache-celeborn-0.3.2-incubating-bin.tgz && \ -tar -zxvf apache-celeborn-0.3.2-incubating-bin.tgz && \ -mv apache-celeborn-0.3.2-incubating-bin/conf/celeborn-defaults.conf.template apache-celeborn-0.3.2-incubating-bin/conf/celeborn-defaults.conf && \ -mv apache-celeborn-0.3.2-incubating-bin/conf/log4j2.xml.template apache-celeborn-0.3.2-incubating-bin/conf/log4j2.xml && \ -mkdir /opt/hadoop && chmod 777 /opt/hadoop && \ -echo -e "celeborn.worker.flusher.threads 4\nceleborn.worker.storage.dirs /tmp\nceleborn.worker.monitor.disk.enabled false" > apache-celeborn-0.3.2-incubating-bin/conf/celeborn-defaults.conf && \ -bash apache-celeborn-0.3.2-incubating-bin/sbin/start-master.sh && bash apache-celeborn-0.3.2-incubating-bin/sbin/start-worker.sh -``` - ### Columnar shuffle mode We have two modes of columnar shuffle 1. prefer cache diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md index d65b94fc1c26..5f9ae2a46b19 100644 --- a/docs/get-started/Velox.md +++ b/docs/get-started/Velox.md @@ -224,11 +224,11 @@ Currently there are several ways to asscess S3 in Spark. Please refer [Velox S3] Gluten with velox backend supports [Celeborn](https://github.com/apache/celeborn) as remote shuffle service. Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`. -Below introduction is used to enable this feature +Below introduction is used to enable this feature. First refer to this URL(https://github.com/apache/celeborn) to setup a celeborn cluster. -When compiling the Gluten Java module, it's required to enable `rss` profile, as follows: +When compiling the Gluten Java module, it's required to enable `celeborn` profile, as follows: ``` mvn clean package -Pbackends-velox -Pspark-3.3 -Pceleborn -DskipTests diff --git a/docs/get-started/build-guide.md b/docs/get-started/build-guide.md index b2e4b9560301..dc4989bc8642 100644 --- a/docs/get-started/build-guide.md +++ b/docs/get-started/build-guide.md @@ -55,17 +55,18 @@ Please set them via `--`, e.g., `--velox_home=/YOUR/PATH`. ### Maven build parameters The below parameters can be set via `-P` for mvn. -| Parameters | Description | Default state | -|---------------------|------------------------------------------------------------------------------|---------------| -| backends-velox | Build Gluten Velox backend. | disabled | -| backends-clickhouse | Build Gluten ClickHouse backend. | disabled | -| rss | Build Gluten with Remote Shuffle Service, only applicable for Velox backend. | disabled | -| delta | Build Gluten with Delta Lake support. | disabled | -| iceberg | Build Gluten with Iceberg support. | disabled | -| spark-3.2 | Build Gluten for Spark 3.2. | enabled | -| spark-3.3 | Build Gluten for Spark 3.3. | disabled | -| spark-3.4 | Build Gluten for Spark 3.4. | disabled | -| spark-3.5 | Build Gluten for Spark 3.5. | disabled | +| Parameters | Description | Default state | +|---------------------|---------------------------------------|---------------| +| backends-velox | Build Gluten Velox backend. | disabled | +| backends-clickhouse | Build Gluten ClickHouse backend. | disabled | +| celeborn | Build Gluten with Celeborn. | disabled | +| uniffle | Build Gluten with Uniffle. | disabled | +| delta | Build Gluten with Delta Lake support. | disabled | +| iceberg | Build Gluten with Iceberg support. | disabled | +| spark-3.2 | Build Gluten for Spark 3.2. | enabled | +| spark-3.3 | Build Gluten for Spark 3.3. | disabled | +| spark-3.4 | Build Gluten for Spark 3.4. | disabled | +| spark-3.5 | Build Gluten for Spark 3.5. | disabled | ## Gluten Jar for Deployment The gluten jar built out is under `GLUTEN_SRC/package/target/`.