Update config docs

apache · Oct 11, 2024 · 3a62740 · 3a62740
1 parent 2dbb6eb
commit 3a62740
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 3 deletions.
diff --git a/datafusion/common/src/config.rs b/datafusion/common/src/config.rs
@@ -384,8 +384,12 @@ config_namespace! {
         /// and `Binary/BinaryLarge` with `BinaryView`.
         pub schema_force_view_types: bool, default = false
 
-        /// (reading) If true, parquet reader will read columns of `Binary/LargeBinary` with `Utf8`,
-        /// and `BinaryView` with `Utf8View`.
+        /// (reading) If true, parquet reader will read columns of
+        /// `Binary/LargeBinary` with `Utf8`, and `BinaryView` with `Utf8View`.
+        ///
+        /// Parquet files generated by some legacy writers do not correctly set
+        /// the UTF8 flag for strings, causing string columns to be loaded as
+        /// BLOB instead.
         pub binary_as_string: bool, default = false
 
         // The following options affect writing to parquet files

diff --git a/docs/source/user-guide/configs.md b/docs/source/user-guide/configs.md
@@ -57,7 +57,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
 | datafusion.execution.parquet.pushdown_filters                           | false                     | (reading) If true, filter expressions are be applied during the parquet decoding operation to reduce the number of rows decoded. This optimization is sometimes called "late materialization".                                                                                                                                                                                                                                                                                                                                                                           |
 | datafusion.execution.parquet.reorder_filters                            | false                     | (reading) If true, filter expressions evaluated during the parquet decoding operation will be reordered heuristically to minimize the cost of evaluation. If false, the filters are applied in the same order as written in the query                                                                                                                                                                                                                                                                                                                                    |
 | datafusion.execution.parquet.schema_force_view_types                    | false                     | (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`, and `Binary/BinaryLarge` with `BinaryView`.                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| datafusion.execution.parquet.binary_as_string                           | false                     | (reading) If true, parquet reader will read columns of `Binary/LargeBinary` with `Utf8`, and `BinaryView` with `Utf8View`.                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| datafusion.execution.parquet.binary_as_string                           | false                     | (reading) If true, parquet reader will read columns of `Binary/LargeBinary` with `Utf8`, and `BinaryView` with `Utf8View`. Parquet files generated by some legacy writers do not correctly set the UTF8 flag for strings, causing string columns to be loaded as BLOB instead.                                                                                                                                                                                                                                                                                           |
 | datafusion.execution.parquet.data_pagesize_limit                        | 1048576                   | (writing) Sets best effort maximum size of data page in bytes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | datafusion.execution.parquet.write_batch_size                           | 1024                      | (writing) Sets write_batch_size in bytes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
 | datafusion.execution.parquet.writer_version                             | 1.0                       | (writing) Sets parquet writer version valid values are "1.0" and "2.0"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |