This release consists of 234 commits from 59 contributors. See credits at the end of this changelog for more information.
Breaking changes:
- Remove ScalarFunctionDefinition #10325 (lewiszlw)
- Introduce user-defined signature #10439 (jayzhan211)
- Remove
AggregateFunctionDefinition::Name
#10441 (lewiszlw) - Make
CREATE EXTERNAL TABLE
format options consistent, remove special syntax forHEADER ROW
,DELIMITER
andCOMPRESSION
#10404 (berkaysynnada) - feat: allow
array_slice
to take an optional stride parameter #10469 (jonahgao) - Minor: Extend more style of udaf
expr_fn
, Remove order args forcovar_samp
andcovar_pop
#10492 (jayzhan211) - Remove
file_type()
fromFileFormat
#10499 (Jefffrey) - UDAF: Extend more args to
state_fields
andgroups_accumulator_supported
and introduceReversedUDAF
#10525 (jayzhan211) - Remove
Expr::GetIndexedField
, replaceExpr::{field,index,range}
withFieldAccessor
,IndexAccessor
, andSliceAccessor
#10568 (jayzhan211) - Improve ContextProvider #10577 (lewiszlw)
- Minor: Use slice in
ConcreteTreeNode
#10666 (peter-toth) - Add reference visitor
TreeNode
APIs, changeExecutionPlan::children()
andPhysicalExpr::children()
return references #10543 (peter-toth) - Introduce Sum UDAF #10651 (jayzhan211)
Implemented enhancements:
- feat: optional args for regexp_* UDFs #10514 (Michael-J-Ward)
- feat: Expose Parquet Schema Adapter #10515 (HawaiianSpork)
- feat: API for collecting statistics/index for metadata of a parquet file + tests #10537 (NGA-TRAN)
- feat: Add eliminate group by constant optimizer rule #10591 (korowa)
- feat: extend
unnest
to support Struct datatype #10429 (duongcongtoai) - feat: add substrait support for Interval types and literals #10646 (waynexia)
- feat: support unparsing LogicalPlan::Window nodes #10767 (devinjdangelo)
- feat: Update Parquet row filtering to handle type coercion #10716 (jeffreyssmith2nd)
Fixed bugs:
- fix: make
columnize_expr
resistant to display_name collisions #10459 (jonahgao) - fix: avoid compressed json files repartitioning #10470 (korowa)
- fix: parsing timestamp with date format #10476 (shanretoo)
- fix:
array_slice
panics #10547 (jonahgao) - fix: pass
quote
parameter to CSV writer #10671 (DDtKey) - fix: CI compilation failed on substrait #10683 (jonahgao)
- fix: fix string repeat for negative numbers #10760 (tshauck)
- fix:
array_slice
andarray_element
panicked on empty args #10804 (jonahgao)
Documentation updates:
- Prepare 38.0.0 release candidate 1 #10407 (andygrove)
- chore(docs): update subquery documentation with more information #10361 (sanderson)
- minor: Remove docs archive #10416 (andygrove)
- Minor: format comments in
PushDownFilter
rule #10437 (alamb) - Minor: Add usecase to comments in
LogicalPlan::recompute_schema
#10443 (alamb) - doc: fix old master branch references to main #10458 (Jefffrey)
- Minor: Improved document string for
LogicalPlanBuilder
#10496 (AbrarNitk) - Add to_date function to scalar functions doc #10601 (Omega359)
- Docs: Update PR workflow documentation #10532 (alamb)
- Minor: Add examples of using TreeNode with
Expr
#10686 (alamb) - docs: add documents to substrait type variation consts #10719 (waynexia)
- Minor: (Doc) Enable rt-multi-thread feature for sample code #10770 (hsiang-c)
Other:
- Minor: Add more docs and examples for
Expr::unalias
#10406 (alamb) - minor: Remove [RUST][datafusion] from release vote email subject line #10411 (andygrove)
- fix dml logical plan output schema #10394 (leoyvens)
- [MINOR]: Move transpose code to under common #10409 (mustafasrepo)
- Fix incorrect Schema over aggregate function, Remove unnecessary
exprlist_to_fields_aggregate
#10408 (jonahgao) - Enable user defined display_name for ScalarUDF #10417 (yyy1000)
- Fix and improve
CommonSubexprEliminate
rule #10396 (peter-toth) - Simplify making information_schame tables #10420 (lewiszlw)
- only consider main part of the url when deciding is_collection in listing table #10419 (y-f-u)
- make common expression alias human-readable #10333 (MohamedAbdeen21)
- Minor: Simplify + document
EliminateCrossJoin
better #10427 (alamb) - During expression equality, check for new ordering information #10434 (mustafasrepo)
- Revert 10333 / changes to aliasing in CommonSubExprEliminate #10436 (MohamedAbdeen21)
- Improve flight sql examples #10432 (lewiszlw)
- Move Covariance (Population) covar_pop to be a User Defined Aggregate Function #10418 (yyy1000)
- Stop copying LogicalPlan and Exprs in
OptimizeProjections
(2% faster planning) #10405 (alamb) - chore: Improve release process for next time #10447 (andygrove)
- Move bit_and_or_xor unit tests to slt #10457 (NoeB)
- Remove some Expr clones in
EliminateCrossJoin
(3%-5% faster planning) #10430 (alamb) - refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454 (erratic-pattern)
- Add
simplify
method to aggregate function #10354 (milenkovicm) - Add cast array test to sqllogictest #10474 (viirya)
- Add
Expr::try_as_col
, deprecateExpr::try_into_col
(speed up optimizer) #10448 (alamb) - Implement
From<Arc<LogicalPlan>>
forLogicalPlanBuilder
#10466 (AbrarNitk) - Minor: Improve documentation for
catalog.has_header
config option #10452 (alamb) - Minor: Simplify conjunction and disjunction, improve docs #10446 (alamb)
- Stop copying LogicalPlan and Exprs in
ReplaceDistinctWithAggregate
#10460 (ClSlaid) - Stop copying LogicalPlan and Exprs in
EliminateCrossJoin
(4% faster planning) #10431 (alamb) - Improved ergonomy for
CREATE EXTERNAL TABLE OPTIONS
: Don't require quotations for simple namespaced keys likefoo.bar
#10483 (ozankabak) - Replace
GetFieldAccess
with indexing function inSqlToRel
#10375 (jayzhan211) - Fix values with different data types caused failure #10445 (b41sh)
- Fix SortMergeJoin with join filter filtering all rows out #10495 (viirya)
- chore: use fullpath in macro to avoid declaring in other module #10503 (jayzhan211)
- Minor: remove unused source file
udf.rs
#10497 (jonahgao) - Support UDAF to align Builtin aggregate function #10493 (jayzhan211)
- Minor: add a test for
current_time
(no args) #10509 (alamb) - [MINOR]: Move pipeline checker rule to the end #10502 (mustafasrepo)
- Minor: Extract parent/child limit calculation into a function, improve docs #10501 (alamb)
- Fix window expr deserialization #10506 (lewiszlw)
- Update substrait requirement from 0.32.0 to 0.33.3 #10516 (dependabot[bot])
- Stop copying LogicalPlan and Exprs in
TypeCoercion
(10% faster planning) #10356 (alamb) - Implement unparse
IS_NULL
to String and enhance the tests #10529 (goldmedal) - Fix panic in array_agg(distinct) query #10526 (jayzhan211)
- Move min_max unit tests to slt #10539 (xinlifoobar)
- Implement unparse
IsNotFalse
to String #10538 (goldmedal) - Implement Unparse TryCast Expr --> String Support #10542 (xinlifoobar)
- Implement unparse
Placeholder
to String #10540 (reswqa) - Implement unparse
OuterReferenceColumn
to String #10544 (goldmedal) - Stop copying LogicalPlan and Exprs in
PushDownFilter
(4%-6% faster planning) #10444 (alamb) - Stop most copying LogicalPlan and Exprs in
ScalarSubqueryToJoin
#10489 (alamb) - Example for simple Expr --> SQL conversion #10528 (edmondop)
- fix
null_count
oncompute_record_batch_statistics
to report null counts across partitions #10468 (samuelcolvin) - Minor: Add
PullUpCorrelatedExpr::new
and improve documentation #10500 (alamb) - Stop copying LogicalPlan and Exprs in
PushDownLimit
#10508 (alamb) - Break up contributing guide into smaller pages #10533 (alamb)
- PhysicalExpr Orderings with Range Information #10504 (berkaysynnada)
- Implement unparse
ScalarVariable
to String #10541 (reswqa) - Handle dictionary values in ScalarValue serde #10563 (thinkharderdev)
- Improve signature of
get_field
function #10569 (lewiszlw) - Implement Unparse
GroupingSet
Expr --> String Support sql #10555 (xinlifoobar) - Minor: Move proxy to datafusion common #10561 (jayzhan211)
- Update prost-build requirement from =0.12.4 to =0.12.6 #10578 (dependabot[bot])
- Add examples of how to convert logical plan to/from sql strings #10558 (xinlifoobar)
- Fix: Sort Merge Join LeftSemi issues when JoinFilter is set #10304 (comphead)
- Minor: Fix
ArrayFunctionRewriter
name reporting #10581 (alamb) - Improve
UserDefinedLogicalNode::from_template
API to returnResult
#10575 (lewiszlw) - Migrate testing optimizer rules to use
rewrite
API #10576 (lewiszlw) - test: add more tests for statistics reading #10592 (NGA-TRAN)
- refactor: reduce allocations in push down filter #10567 (erratic-pattern)
- Fix compilation of datafusion-cli on 32bit targets #10594 (nathaniel-daniel)
- Rename monotonicity as output_ordering in ScalarUDF's #10596 (berkaysynnada)
- Implement Unparser for
UNION ALL
#10603 (phillipleblanc) - Improve
UserDefinedLogicalNodeCore::from_template
API to return Result #10597 (lewiszlw) - Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common #10574 (jayzhan211)
- Minor: Consolidate some integration tests into
core_integration
#10588 (alamb) - Stop copying LogicalPlan and Exprs in
SingleDistinctToGroupBy
#10527 (appletreeisyellow) - [MINOR]: Update get range implementation for lead lag window functions #10614 (mustafasrepo)
- Minor: Improve documentation in sql_to_plan example #10582 (alamb)
- Docs: add examples for
RuntimeEnv::register_object_store
, improve error messages #10617 (aditanase) - Add support for Substrait List/EmptyList literals #10615 (Blizzara)
- Add to_unixtime function to scalar functions doc #10620 (Omega359)
- Test for reading read statistics from parquet files without statistics and boolean & struct data type #10608 (NGA-TRAN)
- adding benchmark for extracting arrow statistics from parquet #10610 (Lordworms)
- Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573 (goldmedal)
- add catalog as part of the table path in plan_to_sql #10612 (y-f-u)
- Refactor parquet row group pruning into a struct (use new statistics API, part 1) #10607 (alamb)
- Extract
Date32
parquet statistics asDate32Array
rather thanInt32Array
#10593 (xinlifoobar) - Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL #10625 (phillipleblanc)
- Fix broken build/test from merge #10637 (phillipleblanc)
- Add SessionContext::register_object_store #10621 (alamb)
- Minor: Move median test #10611 (jayzhan211)
- Add support for Substrait Struct literals and type #10622 (Blizzara)
- fix Incorrect statistics read for i8 i16 columns in parquet #10629 (Lordworms)
- Minor: add runtime asserts to
RowGroup
#10641 (alamb) - Update cli Dockerfile to a newer ubuntu release, newer rust release #10638 (Omega359)
- More properly handle nullability of types/literals in Substrait #10640 (Blizzara)
- fix wrong type validation on unnest expr #10657 (duongcongtoai)
- Fix incorrect statistics read for binary columns in parquet #10645 (xinlifoobar)
- Fix
NULL["field"]
for expr_API #10655 (alamb) - Update substrait requirement from 0.33.3 to 0.34.0 #10632 (dependabot[bot])
- Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) #10662 (alamb)
- Add
FileScanConfig::new()
API #10623 (alamb) - Minor: Remove
GetFieldAccessSchema
#10665 (jayzhan211) - Move Median to
functions-aggregate
and Introduce Numeric signature #10644 (jayzhan211) - Fix
Coalesce
casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion #10268 (jayzhan211) - Fix compilation "comparison_binary_numeric_coercion not found" #10677 (alamb)
- refactor: simplify converting List DataTypes to
ScalarValue
#10675 (jonahgao) - Minor: Improve ObjectStoreUrl docs + examples #10619 (alamb)
- Add tests for reading numeric limits in parquet statistics #10642 (alamb)
- Update nix requirement from 0.28.0 to 0.29.0 #10684 (dependabot[bot])
- refactor: Move SchemaAdapter from parquet module to data source #10680 (HawaiianSpork)
- Convert first, last aggregate function to UDAF #10648 (mustafasrepo)
- Minor: CastExpr Ordering Handle #10650 (berkaysynnada)
- Factor out common datafusion types into another proto file #10649 (mustafasrepo)
- Minor: Add tests showing aggregate behavior for NaNs #10634 (alamb)
- Improve
ParquetExec
and related documentation #10647 (alamb) - minor: inconsistent group by position planning #10679 (korowa)
- Remove duplicate function name in its aliases list #10661 (goldmedal)
- Add protobuf serde support for
LogicalPlan::Unnest
#10681 (akoshchiy) - Support Substrait's VirtualTables #10531 (Blizzara)
- support serialization and deserialization limit in the aggregation exec #10692 (liukun4515)
- Display date32/64 in YYYY-MM-DD format #10691 (houqp)
- Fix: array list values are leaked on nested
unnest
operators #10689 (duongcongtoai) - Support LogicalPlan::Distinct in unparser #10690 (yyy1000)
- Remove redundant upper case aliases for
median
,first_value
andlast_value
#10696 (goldmedal) - Minor: improve Expr documentation #10685 (alamb)
- chore: align re-exports in functions-aggregate #10705 (waynexia)
- Fix typo in bench.sh #10698 (vimt)
- Fix incorrect statistics read for unsigned integers columns in parquet #10704 (xinlifoobar)
- Separate
Partitioning
protobuf serialization code #10708 (lewiszlw) - Support consuming Substrait with compound signature function names #10653 (Blizzara)
- Minor: Add examples of using TreeNode with
LogicalPlan
#10687 (alamb) - Add
ParquetExec::builder()
, deprecateParquetExec::new
#10636 (alamb) - feature: Add a WindowUDFImpl::simplify() API #9906 (guojidan)
- Chore: clean up udwf example && remove redundant import #10718 (guojidan)
- Push down filter as table partition list prefix #10693 (houqp)
- Make swap_hash_join public API #10702 (viirya)
- ci: fix clippy error on main #10723 (jonahgao)
- CI: Fix complaints from newer Clippy versions #10725 (comphead)
- Remove Eager Trait for Joins #10721 (berkaysynnada)
- Minor: fix signature
fn octect_length()
#10726 (marvinlanhenke) - Update rstest requirement from 0.19.0 to 0.20.0 #10734 (dependabot[bot])
- Update rstest_reuse requirement from 0.6.0 to 0.7.0 #10733 (dependabot[bot])
- Add example for building an external secondary index for parquet files #10549 (alamb)
- Minor: move stddev test to slt #10741 (marvinlanhenke)
- fix(CLI): can not create external tables with format options #10739 (jonahgao)
- Add support for
AggregateExpr
,WindowExpr
rewrite. #10742 (mustafasrepo) - Fix SMJ Left Anti Join when the join filter is set #10724 (comphead)
- Introduce FunctionRegistry dependency to optimize and rewrite rule #10714 (jayzhan211)
- Minor: Add SMJ to TPCH benchmark usage #10747 (comphead)
- Minor: Split physical_plan/parquet/mod.rs into smaller modules #10727 (alamb)
- minor: consolidate unparser integration tests #10736 (devinjdangelo)
- Minor: Move aggregate variance to slt #10750 (marvinlanhenke)
- Extract parquet statistics from timestamps with timezones #10766 (xinlifoobar)
- Minor: Add tests for extracting dictionary parquet statistics #10729 (alamb)
- Update rstest requirement from 0.20.0 to 0.21.0 #10774 (dependabot[bot])
- Minor: Refactor memory size estimation for HashTable #10748 (marvinlanhenke)
- Reduce code repetition in
datafusion/functions
mod files #10700 (MohamedAbdeen21) - Support negatives in split part #10780 (tshauck)
- Extract parquet statistics from
LargeUtf8
columns and Add tests forUTF8
AndLargeUTF8
#10762 (Weijun-H) - Cleanup GetIndexedField #10769 (lewiszlw)
- Extract parquet statistics from f16 columns, add
ScalarValue::Float16
#10763 (Lordworms) - Handle empty rows for
array_sort
#10786 (jayzhan211) - Fix extract parquet statistics from LargeBinary columns #10775 (xinlifoobar)
- Extract parquet statistics from Time32 and Time64 columns #10771 (Lordworms)
- chore: fix
last_value
coercion #10783 (appletreeisyellow) - Fix extract parquet statistics from Decimal256 columns #10777 (xinlifoobar)
- Speed up arrow_statistics test #10735 (alamb)
- minor: Refactor some unparser methods to improve readability #10788 (devinjdangelo)
- Convert variance sample to udaf #10713 (yyin-dev)
- Improve docs and fix a typo #10798 (lewiszlw)
- Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files #10711 (xinlifoobar)
- SMJ: Add more tests and improve comments #10784 (comphead)
- Handle EmptyRelation during SQL unparsing #10803 (goldmedal)
- Document Committer and PMC process #10778 (alamb)
- Int64 as default type for make_array function empty or null case #10790 (jayzhan211)
- Split
SessionState
into its own module #10794 (alamb) - Add
StreamProvider
for configuringStreamTable
#10600 (matthewmturner) - Bench: Add
PREFER_HASH_JOIN
env variable #10809 (comphead) - Add
ParquetAccessPlan
, unify RowGroup selection and PagePruning selection #10738 (alamb) - Fix
ScalarUDFImpl::propagate_constraints
doc #10810 (lewiszlw) - Extract Parquet statistics from
Interval
column #10801 (marvinlanhenke) - build(deps): upgrade sqlparser to 0.47.0 #10392 (tisonkun)
- Refactor and simplify the SQL unparser #10811 (goldmedal)
- Minor: Remove code duplication in
memory_limit
derivation for datafusion-cli #10814 (comphead) - build(deps): update Arrow/Parquet to
52.0
, object-store to0.10
#10765 (waynexia) - chore: Prepare 39.0.0-rc1 #10828 (andygrove)
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
44 Andrew Lamb
18 Jay Zhan
14 张林伟
11 Andy Grove
11 Xin Li
10 Jonah Gao
8 Jax Liu
7 Mustafa Akur
7 Oleks V
7 dependabot[bot]
5 Arttu
5 Berkay Şahin
5 Marvin Lanhenke
4 Lordworms
4 Ruihang Xia
3 Bruce Ritchie
3 Devin D'Angelo
3 Duong Cong Toai
3 Eduard Karacharov
3 Junhao Liu
3 Liang-Chi Hsieh
3 Mohamed Abdeen
3 Nga Tran
3 Peter Toth
3 Phillip LeBlanc
2 Abrar Khan
2 Adam Curtis
2 Chunchun Ye
2 Jeffrey Vo
2 Michael Maletich
2 QP Hou
2 Trent Hauck
2 Weijie Guo
2 junxiangMu
2 yfu
1 Adrian Tanase
1 Alex Huang
1 Andrey Koshchiy
1 Artem Medvedev
1 ClSlaid
1 Dan Harris
1 Edmondo Porcu
1 Jeffrey Smith II
1 Kun Liu
1 Leonardo Yvens
1 Marko Milenković
1 Matthew Turner
1 Mehmet Ozan Kabak
1 Michael J Ward
1 NoeB
1 Samuel Colvin
1 Scott Anderson
1 VimT
1 Yue Yin
1 baishen
1 hsiang-c
1 nathaniel-daniel
1 shanretoo
1 tison
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.