24.0.0 (2023-05-06)
Breaking changes:
- Extract
LogicalPlan::Create*
DDL related plan structures intoLogicalPlan::Ddl
#6121 (alamb) - Complete Extracting LogicalPlan::Drop* DDL related plan structures into LogicalPlan::Ddl #6144 (alamb)
- Remove JIT based on cranelift, and
jit
feature #6164 (yjshen) - Remove Rayon-based Scheduler #6169 (tustvold)
- Cleanup join memory accounting in CrossJoin and NestedLoopsJoin #6188 (tustvold)
Implemented enhancements:
- feat: date_bin output type same with input type #6053 (jiacai2050)
- feat: add version field to ExecutionContext #6052 (Weijun-H)
- Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124 (mustafasrepo)
- Make decimal multiplication allow precision-loss in DataFusion #6103 (viirya)
- Add support for scientific notation (remove old unimplemented() for scientific notation) #6142 (2010YOUY01)
- Support DROP SCHEMA statements #6096 (jaylmiller)
- feat: make threshold for using scalar update in aggregate configurable #6166 (yjshen)
- Adaptive in-memory sort (~2x faster) (#5879) #6163 (tustvold)
- Add Support for Ordering Equivalence #6160 (mustafasrepo)
- Sum and Avg should be able to take inputs of type Dictionary #6197 (viirya)
- CLI: Add support for AWS named profiles #6161 (mr-brobot)
- feat: concurrent physical planning #6138 (crepererum)
- feat: allow scalars to appear as the LHS arg in arithmetic operations on temporal values #6196 (kyle-mccarthy)
- Select Into support #6219 (berkaysynnada)
Fixed bugs:
- fix: Use arrow bitwise_op as binary_kernels #6093 (RTEnzyme)
- fix: allow group by same expr in Aggregate #6091 (jackwener)
- fix: null handling of
ScalarValue::Struct
#6085 (crepererum) - fix: make simplify_expressions use a single schema for resolution #6077 (wolffcm)
- fix: incorrect column pruning in sql with window operations #6039 (yukkit)
- fix: allow group by same expr in Aggregate #6097 (jackwener)
- fix:
reassign_predicate_columns
w/ in-list expr #6114 (crepererum) - fix:
common_subexpr_eliminate
and aggregates #6129 (crepererum) - fix: type coercion ignore prevent
Interval - Date
#6177 (jackwener) - fix:
common_subexpr_eliminate
w/ aggregates and relations #6199 (crepererum) - fix: slt fail due to bors problem #6242 (jackwener)
Documentation updates:
- chore: Update api docs for
SessionContext
,TaskContext
, etc #6106 (alamb) - docs: consolidate datafusion-cli docs #6218 (alamb)
Merged pull requests:
- fix: Use arrow bitwise_op as binary_kernels #6093 (RTEnzyme)
- fix: allow group by same expr in Aggregate #6091 (jackwener)
- Update DataFusion architecture documentation #6056 (alamb)
- feat: date_bin output type same with input type #6053 (jiacai2050)
- fix: null handling of
ScalarValue::Struct
#6085 (crepererum) - fix: make simplify_expressions use a single schema for resolution #6077 (wolffcm)
- fix: incorrect column pruning in sql with window operations #6039 (yukkit)
- Substrait: Handle multiple select with the same table/columns in INTERSECT #6059 (nseekhao)
- fix: allow group by same expr in Aggregate #6097 (jackwener)
- chore(deps): update regex and regex-syntax requirement from 0.6.28 to 0.7.1 #6095 (alamb)
- test: more sqllogicaltest for GROUPBY. #6100 (jackwener)
- minor: replace unwrap() with Err #6101 (jackwener)
- Remove temporal to kernels_arrow #6069 (berkaysynnada)
- feat: add version field to ExecutionContext #6052 (Weijun-H)
- chore(deps): update substrait requirement from 0.7.1 to 0.8.0 #6105 (dependabot[bot])
- refactor(sqllogictests): port group by test to sqllogic #6088 (aprimadi)
- Use arrow kernels for bitwise operations #6098 (alamb)
- Unordered PARTITION BY column implementation (to prevent pipeline breaking) #6036 (mustafasrepo)
- Add from_string_hash_map() method for SessionConfig #6111 (yahoNanJing)
- Parallelise collect_partitioned #6109 (tustvold)
- Fix column mapping in output_ordering() and output_partitioning() for ProjectionExec and AggregateExec #6113 (mingmwang)
- fix:
reassign_predicate_columns
w/ in-list expr #6114 (crepererum) - Update to arrow 38 #6115 (tustvold)
- Minor: Split
DmlStatement
into its own module #6120 (alamb) - Remove input schema from PhysicalExpr, move the validation logic to physical expression planner #6122 (mingmwang)
- Move Scalar Subquery validation logic to the Analyzer #6084 (mingmwang)
- chore: Update api docs for
SessionContext
,TaskContext
, etc #6106 (alamb) - Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124 (mustafasrepo)
- Minor: Mark default_session_builder deprecated, add since markings #6123 (alamb)
- [MINOR]: Add GroupByOrderMode to the AggregateExec display #6141 (mustafasrepo)
- Extract
LogicalPlan::Create*
DDL related plan structures intoLogicalPlan::Ddl
#6121 (alamb) - Fix broken build due to merge conflict #6143 (alamb)
- [Bug fix] Source projection output ordering #6136 (mustafasrepo)
- Make decimal multiplication allow precision-loss in DataFusion #6103 (viirya)
- MemoryExec INSERT INTO refactor to use ExecutionPlan #6049 (metesynnada)
- Complete Extracting LogicalPlan::Drop* DDL related plan structures into LogicalPlan::Ddl #6144 (alamb)
- fix:
common_subexpr_eliminate
and aggregates #6129 (crepererum) - Add support for scientific notation (remove old unimplemented() for scientific notation) #6142 (2010YOUY01)
- Use ParquetObjectReader #6130 (tustvold)
- Add bench.sh script to automate benchmarking DataFusion against itself #6131 (alamb)
- Port test in select.rs to sqllogic #6158 (aprimadi)
- Support uint64 literals #6146 (jackkleeman)
- Remove JIT based on cranelift, and
jit
feature #6164 (yjshen) - minor: add decimal roundtrip tests for the row format #6165 (yjshen)
- Support DROP SCHEMA statements #6096 (jaylmiller)
- Improve
compare.py
output to usemin
times and better column titles #6134 (alamb) - Fix GroupByOrderMode documentation #6168 (aprimadi)
- Remove Rayon-based Scheduler #6169 (tustvold)
- [Minor] Fix typo in SQL data types doc #6178 (alamb)
- feat: make threshold for using scalar update in aggregate configurable #6166 (yjshen)
- Handle ScalarValue::Dictionary in add_to_row and update_avg_to_row #6175 (viirya)
- chore(datetime_expression): support uppercase granularity for
DATE_TRUNC
func #6185 (appletreeisyellow) - Simplify HashJoin memory accounting #6170 (tustvold)
- Adaptive in-memory sort (~2x faster) (#5879) #6163 (tustvold)
- refactor: optimizer shouldn't assume failed rules are internal error #6184 (wolffcm)
- fix: type coercion ignore prevent
Interval - Date
#6177 (jackwener) - minor: remove default match for function signature #6186 (alamb)
- Add Support for Ordering Equivalence #6160 (mustafasrepo)
- Remove db-benchmark (and add them to arrow-datafusion-python repo instead) #6204 (andygrove)
- fix:
common_subexpr_eliminate
w/ aggregates and relations #6199 (crepererum) - Add parquet filter and sort to bench.sh #6172 (alamb)
- minor: remove unused code in binary.rs #6203 (alamb)
- Sum and Avg should be able to take inputs of type Dictionary #6197 (viirya)
- minor: fix tiny typo in table.rs #6220 (okue)
- Lower some log levels from
debug
totrace
during plan execution #6193 (alamb) - minor: fix typo in comments. #6225 (jackwener)
- CLI: Add support for AWS named profiles #6161 (mr-brobot)
- feat: concurrent physical planning #6138 (crepererum)
- Cleanup join memory accounting in CrossJoin and NestedLoopsJoin #6188 (tustvold)
- minor: Move shift_date tests to be with code #6200 (alamb)
- Minor: rename variable for clarity #6229 (alamb)
- feat: allow scalars to appear as the LHS arg in arithmetic operations on temporal values #6196 (kyle-mccarthy)
- Supply consistent format output for FileScanConfig params #6202 (tz70s)
- minor: refactor code to avoid same code. #6222 (jackwener)
- minor: change error type for window expressions #6231 (comphead)
- Add sqllogic test coverage for interval arithmetic #6201 (alamb)
- Simplify MemoryWriteExec #6154 (tustvold)
- refactor: separate get_result_type from
coerce_type
#6221 (jackwener) - fix: slt fail due to bors problem #6242 (jackwener)
- Port tests in errors.rs to sqllogictest #6239 (parkma99)
- Select Into support #6219 (berkaysynnada)
- refactor: use get_result_type to replace binary_operator_data_type. #6241 (jackwener)
- Port test in union.rs to sqllogic #6224 (parkma99)
- Port tests in expr.rs to sqllogictest #6240 (parkma99)
- Port tests in cast.rs to sqllogictest #6244 (parkma99)
- Port tests in identifiers.rs to sqllogictest #6245 (parkma99)
- Minor: allow creating infinite external tables via SQL (for testing) #6235 (alamb)
- Minor: consolidate test data #6217 (alamb)
- docs: consolidate datafusion-cli docs #6218 (alamb)
- Port tests in wildcard.rs to sqllogictest #6249 (masanobbb)
- Simplify and speed up MemoryExec insert #6236 (alamb)
- minor: fix typo #6253 (okue)
- refactor: separate get_common_type / get_result_type for temporal type #6250 (jackwener)
- Port tests in set_variable.rs to sqllogictest #6255 (parkma99)