diff --git a/common/src/main/scala/org/apache/comet/CometConf.scala b/common/src/main/scala/org/apache/comet/CometConf.scala index 5c3ebf6fb..5f67f4107 100644 --- a/common/src/main/scala/org/apache/comet/CometConf.scala +++ b/common/src/main/scala/org/apache/comet/CometConf.scala @@ -222,8 +222,8 @@ object CometConf extends ShimCometConf { conf("spark.comet.columnar.shuffle.memorySize") .doc( "The optional maximum size of the memory used for Comet columnar shuffle, in MiB. " + - "Note that this config is only used when `spark.comet.columnar.shuffle.enabled` is " + - "true. Once allocated memory size reaches this config, the current batch will be " + + "Note that this config is only used when `spark.comet.exec.shuffle.mode` is " + + "`jvm`. Once allocated memory size reaches this config, the current batch will be " + "flushed to disk immediately. If this is not configured, Comet will use " + "`spark.comet.shuffle.memory.factor` * `spark.comet.memoryOverhead` as " + "shuffle memory size. If final calculated value is larger than Comet memory " + @@ -259,7 +259,7 @@ object CometConf extends ShimCometConf { "prefer dictionary encoding when shuffling the column. If the ratio is higher than " + "this config, dictionary encoding will be used on shuffling string column. This config " + "is effective if it is higher than 1.0. By default, this config is 10.0. Note that this " + - "config is only used when 'spark.comet.columnar.shuffle.enabled' is true.") + "config is only used when `spark.comet.exec.shuffle.mode` is `jvm`.") .doubleConf .createWithDefault(10.0) diff --git a/docs/source/user-guide/configs.md b/docs/source/user-guide/configs.md index 8a0f2440a..0f648324e 100644 --- a/docs/source/user-guide/configs.md +++ b/docs/source/user-guide/configs.md @@ -48,4 +48,4 @@ Comet provides the following configuration settings. | spark.comet.scan.enabled | Whether to enable Comet scan. When this is turned on, Spark will use Comet to read Parquet data source. Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. By default, this config is true. | true | | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. By default is disabled. | false | | spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 | -| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when 'spark.comet.columnar.shuffle.enabled' is true. | 10.0 | +| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 | diff --git a/docs/source/user-guide/installation.md b/docs/source/user-guide/installation.md index 7d140a7ac..45f988e29 100644 --- a/docs/source/user-guide/installation.md +++ b/docs/source/user-guide/installation.md @@ -150,5 +150,5 @@ Some cluster managers may require additional configuration, see