Skip to content

Latest commit

 

History

History
333 lines (315 loc) · 30.6 KB

39.0.0.md

File metadata and controls

333 lines (315 loc) · 30.6 KB

Apache DataFusion 39.0.0 Changelog

This release consists of 234 commits from 59 contributors. See credits at the end of this changelog for more information.

Breaking changes:

  • Remove ScalarFunctionDefinition #10325 (lewiszlw)
  • Introduce user-defined signature #10439 (jayzhan211)
  • Remove AggregateFunctionDefinition::Name #10441 (lewiszlw)
  • Make CREATE EXTERNAL TABLE format options consistent, remove special syntax for HEADER ROW, DELIMITER and COMPRESSION #10404 (berkaysynnada)
  • feat: allow array_slice to take an optional stride parameter #10469 (jonahgao)
  • Minor: Extend more style of udaf expr_fn, Remove order args forcovar_samp and covar_pop #10492 (jayzhan211)
  • Remove file_type() from FileFormat #10499 (Jefffrey)
  • UDAF: Extend more args to state_fields and groups_accumulator_supported and introduce ReversedUDAF #10525 (jayzhan211)
  • Remove Expr::GetIndexedField, replace Expr::{field,index,range} with FieldAccessor, IndexAccessor, and SliceAccessor #10568 (jayzhan211)
  • Improve ContextProvider #10577 (lewiszlw)
  • Minor: Use slice in ConcreteTreeNode #10666 (peter-toth)
  • Add reference visitor TreeNode APIs, change ExecutionPlan::children() and PhysicalExpr::children() return references #10543 (peter-toth)
  • Introduce Sum UDAF #10651 (jayzhan211)

Implemented enhancements:

  • feat: optional args for regexp_* UDFs #10514 (Michael-J-Ward)
  • feat: Expose Parquet Schema Adapter #10515 (HawaiianSpork)
  • feat: API for collecting statistics/index for metadata of a parquet file + tests #10537 (NGA-TRAN)
  • feat: Add eliminate group by constant optimizer rule #10591 (korowa)
  • feat: extend unnest to support Struct datatype #10429 (duongcongtoai)
  • feat: add substrait support for Interval types and literals #10646 (waynexia)
  • feat: support unparsing LogicalPlan::Window nodes #10767 (devinjdangelo)
  • feat: Update Parquet row filtering to handle type coercion #10716 (jeffreyssmith2nd)

Fixed bugs:

  • fix: make columnize_expr resistant to display_name collisions #10459 (jonahgao)
  • fix: avoid compressed json files repartitioning #10470 (korowa)
  • fix: parsing timestamp with date format #10476 (shanretoo)
  • fix: array_slice panics #10547 (jonahgao)
  • fix: pass quote parameter to CSV writer #10671 (DDtKey)
  • fix: CI compilation failed on substrait #10683 (jonahgao)
  • fix: fix string repeat for negative numbers #10760 (tshauck)
  • fix: array_slice and array_element panicked on empty args #10804 (jonahgao)

Documentation updates:

  • Prepare 38.0.0 release candidate 1 #10407 (andygrove)
  • chore(docs): update subquery documentation with more information #10361 (sanderson)
  • minor: Remove docs archive #10416 (andygrove)
  • Minor: format comments in PushDownFilter rule #10437 (alamb)
  • Minor: Add usecase to comments in LogicalPlan::recompute_schema #10443 (alamb)
  • doc: fix old master branch references to main #10458 (Jefffrey)
  • Minor: Improved document string for LogicalPlanBuilder #10496 (AbrarNitk)
  • Add to_date function to scalar functions doc #10601 (Omega359)
  • Docs: Update PR workflow documentation #10532 (alamb)
  • Minor: Add examples of using TreeNode with Expr #10686 (alamb)
  • docs: add documents to substrait type variation consts #10719 (waynexia)
  • Minor: (Doc) Enable rt-multi-thread feature for sample code #10770 (hsiang-c)

Other:

  • Minor: Add more docs and examples for Expr::unalias #10406 (alamb)
  • minor: Remove [RUST][datafusion] from release vote email subject line #10411 (andygrove)
  • fix dml logical plan output schema #10394 (leoyvens)
  • [MINOR]: Move transpose code to under common #10409 (mustafasrepo)
  • Fix incorrect Schema over aggregate function, Remove unnecessary exprlist_to_fields_aggregate #10408 (jonahgao)
  • Enable user defined display_name for ScalarUDF #10417 (yyy1000)
  • Fix and improve CommonSubexprEliminate rule #10396 (peter-toth)
  • Simplify making information_schame tables #10420 (lewiszlw)
  • only consider main part of the url when deciding is_collection in listing table #10419 (y-f-u)
  • make common expression alias human-readable #10333 (MohamedAbdeen21)
  • Minor: Simplify + document EliminateCrossJoin better #10427 (alamb)
  • During expression equality, check for new ordering information #10434 (mustafasrepo)
  • Revert 10333 / changes to aliasing in CommonSubExprEliminate #10436 (MohamedAbdeen21)
  • Improve flight sql examples #10432 (lewiszlw)
  • Move Covariance (Population) covar_pop to be a User Defined Aggregate Function #10418 (yyy1000)
  • Stop copying LogicalPlan and Exprs in OptimizeProjections (2% faster planning) #10405 (alamb)
  • chore: Improve release process for next time #10447 (andygrove)
  • Move bit_and_or_xor unit tests to slt #10457 (NoeB)
  • Remove some Expr clones in EliminateCrossJoin(3%-5% faster planning) #10430 (alamb)
  • refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454 (erratic-pattern)
  • Add simplify method to aggregate function #10354 (milenkovicm)
  • Add cast array test to sqllogictest #10474 (viirya)
  • Add Expr::try_as_col, deprecate Expr::try_into_col (speed up optimizer) #10448 (alamb)
  • Implement From<Arc<LogicalPlan>> for LogicalPlanBuilder #10466 (AbrarNitk)
  • Minor: Improve documentation for catalog.has_header config option #10452 (alamb)
  • Minor: Simplify conjunction and disjunction, improve docs #10446 (alamb)
  • Stop copying LogicalPlan and Exprs in ReplaceDistinctWithAggregate #10460 (ClSlaid)
  • Stop copying LogicalPlan and Exprs in EliminateCrossJoin (4% faster planning) #10431 (alamb)
  • Improved ergonomy for CREATE EXTERNAL TABLE OPTIONS: Don't require quotations for simple namespaced keys like foo.bar #10483 (ozankabak)
  • Replace GetFieldAccess with indexing function in SqlToRel #10375 (jayzhan211)
  • Fix values with different data types caused failure #10445 (b41sh)
  • Fix SortMergeJoin with join filter filtering all rows out #10495 (viirya)
  • chore: use fullpath in macro to avoid declaring in other module #10503 (jayzhan211)
  • Minor: remove unused source file udf.rs #10497 (jonahgao)
  • Support UDAF to align Builtin aggregate function #10493 (jayzhan211)
  • Minor: add a test for current_time (no args) #10509 (alamb)
  • [MINOR]: Move pipeline checker rule to the end #10502 (mustafasrepo)
  • Minor: Extract parent/child limit calculation into a function, improve docs #10501 (alamb)
  • Fix window expr deserialization #10506 (lewiszlw)
  • Update substrait requirement from 0.32.0 to 0.33.3 #10516 (dependabot[bot])
  • Stop copying LogicalPlan and Exprs in TypeCoercion (10% faster planning) #10356 (alamb)
  • Implement unparse IS_NULL to String and enhance the tests #10529 (goldmedal)
  • Fix panic in array_agg(distinct) query #10526 (jayzhan211)
  • Move min_max unit tests to slt #10539 (xinlifoobar)
  • Implement unparse IsNotFalse to String #10538 (goldmedal)
  • Implement Unparse TryCast Expr --> String Support #10542 (xinlifoobar)
  • Implement unparse Placeholder to String #10540 (reswqa)
  • Implement unparse OuterReferenceColumn to String #10544 (goldmedal)
  • Stop copying LogicalPlan and Exprs in PushDownFilter (4%-6% faster planning) #10444 (alamb)
  • Stop most copying LogicalPlan and Exprs in ScalarSubqueryToJoin #10489 (alamb)
  • Example for simple Expr --> SQL conversion #10528 (edmondop)
  • fix null_count on compute_record_batch_statistics to report null counts across partitions #10468 (samuelcolvin)
  • Minor: Add PullUpCorrelatedExpr::new and improve documentation #10500 (alamb)
  • Stop copying LogicalPlan and Exprs in PushDownLimit #10508 (alamb)
  • Break up contributing guide into smaller pages #10533 (alamb)
  • PhysicalExpr Orderings with Range Information #10504 (berkaysynnada)
  • Implement unparse ScalarVariable to String #10541 (reswqa)
  • Handle dictionary values in ScalarValue serde #10563 (thinkharderdev)
  • Improve signature of get_field function #10569 (lewiszlw)
  • Implement Unparse GroupingSet Expr --> String Support sql #10555 (xinlifoobar)
  • Minor: Move proxy to datafusion common #10561 (jayzhan211)
  • Update prost-build requirement from =0.12.4 to =0.12.6 #10578 (dependabot[bot])
  • Add examples of how to convert logical plan to/from sql strings #10558 (xinlifoobar)
  • Fix: Sort Merge Join LeftSemi issues when JoinFilter is set #10304 (comphead)
  • Minor: Fix ArrayFunctionRewriter name reporting #10581 (alamb)
  • Improve UserDefinedLogicalNode::from_template API to return Result #10575 (lewiszlw)
  • Migrate testing optimizer rules to use rewrite API #10576 (lewiszlw)
  • test: add more tests for statistics reading #10592 (NGA-TRAN)
  • refactor: reduce allocations in push down filter #10567 (erratic-pattern)
  • Fix compilation of datafusion-cli on 32bit targets #10594 (nathaniel-daniel)
  • Rename monotonicity as output_ordering in ScalarUDF's #10596 (berkaysynnada)
  • Implement Unparser for UNION ALL #10603 (phillipleblanc)
  • Improve UserDefinedLogicalNodeCore::from_template API to return Result #10597 (lewiszlw)
  • Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common #10574 (jayzhan211)
  • Minor: Consolidate some integration tests into core_integration #10588 (alamb)
  • Stop copying LogicalPlan and Exprs in SingleDistinctToGroupBy #10527 (appletreeisyellow)
  • [MINOR]: Update get range implementation for lead lag window functions #10614 (mustafasrepo)
  • Minor: Improve documentation in sql_to_plan example #10582 (alamb)
  • Docs: add examples for RuntimeEnv::register_object_store, improve error messages #10617 (aditanase)
  • Add support for Substrait List/EmptyList literals #10615 (Blizzara)
  • Add to_unixtime function to scalar functions doc #10620 (Omega359)
  • Test for reading read statistics from parquet files without statistics and boolean & struct data type #10608 (NGA-TRAN)
  • adding benchmark for extracting arrow statistics from parquet #10610 (Lordworms)
  • Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573 (goldmedal)
  • add catalog as part of the table path in plan_to_sql #10612 (y-f-u)
  • Refactor parquet row group pruning into a struct (use new statistics API, part 1) #10607 (alamb)
  • Extract Date32 parquet statistics as Date32Array rather than Int32Array #10593 (xinlifoobar)
  • Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL #10625 (phillipleblanc)
  • Fix broken build/test from merge #10637 (phillipleblanc)
  • Add SessionContext::register_object_store #10621 (alamb)
  • Minor: Move median test #10611 (jayzhan211)
  • Add support for Substrait Struct literals and type #10622 (Blizzara)
  • fix Incorrect statistics read for i8 i16 columns in parquet #10629 (Lordworms)
  • Minor: add runtime asserts to RowGroup #10641 (alamb)
  • Update cli Dockerfile to a newer ubuntu release, newer rust release #10638 (Omega359)
  • More properly handle nullability of types/literals in Substrait #10640 (Blizzara)
  • fix wrong type validation on unnest expr #10657 (duongcongtoai)
  • Fix incorrect statistics read for binary columns in parquet #10645 (xinlifoobar)
  • Fix NULL["field"] for expr_API #10655 (alamb)
  • Update substrait requirement from 0.33.3 to 0.34.0 #10632 (dependabot[bot])
  • Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) #10662 (alamb)
  • Add FileScanConfig::new() API #10623 (alamb)
  • Minor: Remove GetFieldAccessSchema #10665 (jayzhan211)
  • Move Median to functions-aggregate and Introduce Numeric signature #10644 (jayzhan211)
  • Fix Coalesce casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion #10268 (jayzhan211)
  • Fix compilation "comparison_binary_numeric_coercion not found" #10677 (alamb)
  • refactor: simplify converting List DataTypes to ScalarValue #10675 (jonahgao)
  • Minor: Improve ObjectStoreUrl docs + examples #10619 (alamb)
  • Add tests for reading numeric limits in parquet statistics #10642 (alamb)
  • Update nix requirement from 0.28.0 to 0.29.0 #10684 (dependabot[bot])
  • refactor: Move SchemaAdapter from parquet module to data source #10680 (HawaiianSpork)
  • Convert first, last aggregate function to UDAF #10648 (mustafasrepo)
  • Minor: CastExpr Ordering Handle #10650 (berkaysynnada)
  • Factor out common datafusion types into another proto file #10649 (mustafasrepo)
  • Minor: Add tests showing aggregate behavior for NaNs #10634 (alamb)
  • Improve ParquetExec and related documentation #10647 (alamb)
  • minor: inconsistent group by position planning #10679 (korowa)
  • Remove duplicate function name in its aliases list #10661 (goldmedal)
  • Add protobuf serde support for LogicalPlan::Unnest #10681 (akoshchiy)
  • Support Substrait's VirtualTables #10531 (Blizzara)
  • support serialization and deserialization limit in the aggregation exec #10692 (liukun4515)
  • Display date32/64 in YYYY-MM-DD format #10691 (houqp)
  • Fix: array list values are leaked on nested unnest operators #10689 (duongcongtoai)
  • Support LogicalPlan::Distinct in unparser #10690 (yyy1000)
  • Remove redundant upper case aliases for median, first_value and last_value #10696 (goldmedal)
  • Minor: improve Expr documentation #10685 (alamb)
  • chore: align re-exports in functions-aggregate #10705 (waynexia)
  • Fix typo in bench.sh #10698 (vimt)
  • Fix incorrect statistics read for unsigned integers columns in parquet #10704 (xinlifoobar)
  • Separate Partitioning protobuf serialization code #10708 (lewiszlw)
  • Support consuming Substrait with compound signature function names #10653 (Blizzara)
  • Minor: Add examples of using TreeNode with LogicalPlan #10687 (alamb)
  • Add ParquetExec::builder(), deprecate ParquetExec::new #10636 (alamb)
  • feature: Add a WindowUDFImpl::simplify() API #9906 (guojidan)
  • Chore: clean up udwf example && remove redundant import #10718 (guojidan)
  • Push down filter as table partition list prefix #10693 (houqp)
  • Make swap_hash_join public API #10702 (viirya)
  • ci: fix clippy error on main #10723 (jonahgao)
  • CI: Fix complaints from newer Clippy versions #10725 (comphead)
  • Remove Eager Trait for Joins #10721 (berkaysynnada)
  • Minor: fix signature fn octect_length() #10726 (marvinlanhenke)
  • Update rstest requirement from 0.19.0 to 0.20.0 #10734 (dependabot[bot])
  • Update rstest_reuse requirement from 0.6.0 to 0.7.0 #10733 (dependabot[bot])
  • Add example for building an external secondary index for parquet files #10549 (alamb)
  • Minor: move stddev test to slt #10741 (marvinlanhenke)
  • fix(CLI): can not create external tables with format options #10739 (jonahgao)
  • Add support for AggregateExpr, WindowExpr rewrite. #10742 (mustafasrepo)
  • Fix SMJ Left Anti Join when the join filter is set #10724 (comphead)
  • Introduce FunctionRegistry dependency to optimize and rewrite rule #10714 (jayzhan211)
  • Minor: Add SMJ to TPCH benchmark usage #10747 (comphead)
  • Minor: Split physical_plan/parquet/mod.rs into smaller modules #10727 (alamb)
  • minor: consolidate unparser integration tests #10736 (devinjdangelo)
  • Minor: Move aggregate variance to slt #10750 (marvinlanhenke)
  • Extract parquet statistics from timestamps with timezones #10766 (xinlifoobar)
  • Minor: Add tests for extracting dictionary parquet statistics #10729 (alamb)
  • Update rstest requirement from 0.20.0 to 0.21.0 #10774 (dependabot[bot])
  • Minor: Refactor memory size estimation for HashTable #10748 (marvinlanhenke)
  • Reduce code repetition in datafusion/functions mod files #10700 (MohamedAbdeen21)
  • Support negatives in split part #10780 (tshauck)
  • Extract parquet statistics from LargeUtf8 columns and Add tests for UTF8 And LargeUTF8 #10762 (Weijun-H)
  • Cleanup GetIndexedField #10769 (lewiszlw)
  • Extract parquet statistics from f16 columns, add ScalarValue::Float16 #10763 (Lordworms)
  • Handle empty rows for array_sort #10786 (jayzhan211)
  • Fix extract parquet statistics from LargeBinary columns #10775 (xinlifoobar)
  • Extract parquet statistics from Time32 and Time64 columns #10771 (Lordworms)
  • chore: fix last_value coercion #10783 (appletreeisyellow)
  • Fix extract parquet statistics from Decimal256 columns #10777 (xinlifoobar)
  • Speed up arrow_statistics test #10735 (alamb)
  • minor: Refactor some unparser methods to improve readability #10788 (devinjdangelo)
  • Convert variance sample to udaf #10713 (yyin-dev)
  • Improve docs and fix a typo #10798 (lewiszlw)
  • Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files #10711 (xinlifoobar)
  • SMJ: Add more tests and improve comments #10784 (comphead)
  • Handle EmptyRelation during SQL unparsing #10803 (goldmedal)
  • Document Committer and PMC process #10778 (alamb)
  • Int64 as default type for make_array function empty or null case #10790 (jayzhan211)
  • Split SessionState into its own module #10794 (alamb)
  • Add StreamProvider for configuring StreamTable #10600 (matthewmturner)
  • Bench: Add PREFER_HASH_JOIN env variable #10809 (comphead)
  • Add ParquetAccessPlan, unify RowGroup selection and PagePruning selection #10738 (alamb)
  • Fix ScalarUDFImpl::propagate_constraints doc #10810 (lewiszlw)
  • Extract Parquet statistics from Interval column #10801 (marvinlanhenke)
  • build(deps): upgrade sqlparser to 0.47.0 #10392 (tisonkun)
  • Refactor and simplify the SQL unparser #10811 (goldmedal)
  • Minor: Remove code duplication in memory_limit derivation for datafusion-cli #10814 (comphead)
  • build(deps): update Arrow/Parquet to 52.0, object-store to 0.10 #10765 (waynexia)
  • chore: Prepare 39.0.0-rc1 #10828 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    44	Andrew Lamb
    18	Jay Zhan
    14	张林伟
    11	Andy Grove
    11	Xin Li
    10	Jonah Gao
     8	Jax Liu
     7	Mustafa Akur
     7	Oleks V
     7	dependabot[bot]
     5	Arttu
     5	Berkay Şahin
     5	Marvin Lanhenke
     4	Lordworms
     4	Ruihang Xia
     3	Bruce Ritchie
     3	Devin D'Angelo
     3	Duong Cong Toai
     3	Eduard Karacharov
     3	Junhao Liu
     3	Liang-Chi Hsieh
     3	Mohamed Abdeen
     3	Nga Tran
     3	Peter Toth
     3	Phillip LeBlanc
     2	Abrar Khan
     2	Adam Curtis
     2	Chunchun Ye
     2	Jeffrey Vo
     2	Michael Maletich
     2	QP Hou
     2	Trent Hauck
     2	Weijie Guo
     2	junxiangMu
     2	yfu
     1	Adrian Tanase
     1	Alex Huang
     1	Andrey Koshchiy
     1	Artem Medvedev
     1	ClSlaid
     1	Dan Harris
     1	Edmondo Porcu
     1	Jeffrey Smith II
     1	Kun Liu
     1	Leonardo Yvens
     1	Marko Milenković
     1	Matthew Turner
     1	Mehmet Ozan Kabak
     1	Michael J Ward
     1	NoeB
     1	Samuel Colvin
     1	Scott Anderson
     1	VimT
     1	Yue Yin
     1	baishen
     1	hsiang-c
     1	nathaniel-daniel
     1	shanretoo
     1	tison

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.