forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/xinli/arrow udf poc #2
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Port `bool_and` and `bool_or` to `AggregateUDFImpl` * Remove trait methods with default implementation * Add `bool_or_udaf` * Register `bool_and` and `bool_or` * Remove from `physical-expr` * Add expressions to logical plan roundtrip test * minor: remove methods with default implementation * Removes redundant tests * Removes hard-coded function names
…r/src/analysis.rs (apache#10992) * propogate error instead of panicking * use macro for creating internal df error
* feat: propagate empty for more join types * feat: update subquery de-correlation test * tests: simplify tests * refactor: better name * style: clippy * refactor: update tests * refactor: rename * refactor: fix spellings * add slt tests
* Add drop_columns to dataframe api apache#11007 * Prettier cleanup * Added additional drop_columns tests and fixed issue with nonexistent columns.
* push down non-unnest only Signed-off-by: jayzhan211 <[email protected]> * add doc Signed-off-by: jayzhan211 <[email protected]> * add doc Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * rewrite unnest push donw filter Signed-off-by: jayzhan211 <[email protected]> * remove comment Signed-off-by: jayzhan211 <[email protected]> * avoid double recurisve Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
* feat: add temporal_coercion check * fix: add return stmt * chore: add slts * fix: remove println * Update datafusion/expr/src/type_coercion/binary.rs --------- Co-authored-by: Andrew Lamb <[email protected]>
* Deprecate OptimizerRule::try_optimize * optimize_children * Apply review suggestions * Fix clippy lint
* Minor changes * Minor changes * Re-introduce group by expression check
* compute gcd with unsigned ints * add test for the i64::MAX cases * move unsigned_abs below zero test to remove unnecessary casts * add slt test for gcd on max values instead of unit tests
* Add distinct_on to dataframe api apache#11011 * cargo fmt * Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
…mestamp to timezone (apache#11056)
* test and implement boolean data page statistics * left out a collect & forgot to change the Check to Both * Update datafusion/core/src/datasource/physical_plan/parquet/statistics.rs --------- Co-authored-by: Andrew Lamb <[email protected]>
* push down non-unnest only Signed-off-by: jayzhan211 <[email protected]> * add doc Signed-off-by: jayzhan211 <[email protected]> * to lowercase Signed-off-by: jayzhan211 <[email protected]> * fix tpch Signed-off-by: jayzhan211 <[email protected]> * Update test * fix test Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
…ical-expr dependency for `datafusion-function` crate (apache#11061) * mv to expr Signed-off-by: jayzhan211 <[email protected]> * upd lock Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
…e#11046) * wip Signed-off-by: Kevin Su <[email protected]> * add a test Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
* feat: Add method to add analyzer rules to SessionContext Signed-off-by: Kevin Su <[email protected]> * Add a test Signed-off-by: Kevin Su <[email protected]> * Add analyze_plan Signed-off-by: Kevin Su <[email protected]> * update test Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
…pache#11041) * Fix: Sort Merge Join crashes on TPCH Q21 * Fix LeftAnti SMJ join when the join filter is set * rm dbg * Minor: disable fuzz test to avoid CI spontaneous failures * Minor: disable fuzz test to avoid CI spontaneous failures * Fix: Sort Merge Join crashes on TPCH Q21 * Fix LeftAnti SMJ join when the join filter is set * rm dbg * Minor: disable fuzz test to avoid CI spontaneous failures * Minor: disable fuzz test to avoid CI spontaneous failures * Minor: Add routine to debug join fuzz tests * Minor: Add routine to debug join fuzz tests * Minor: Add routine to debug join fuzz tests * Minor: Add routine to debug join fuzz tests * Minor: Add routine to debug join fuzz tests * SMJ: fix streaming row concurrency issue for LEFT SEMI filtered join * SMJ: fix streaming row concurrency issue for LEFT SEMI filtered join * SMJ: fix streaming row concurrency issue for LEFT SEMI filtered join
apache#10701) * Add `advanced_parquet_index.rs` example of indexing into parquet files * pre-load page index * fix comment * Apply suggestions from code review Thank you @Weijun-H Co-authored-by: Alex Huang <[email protected]> * Add ASCII ART * Update datafusion-examples/README.md Co-authored-by: Alex Huang <[email protected]> * Update datafusion-examples/examples/advanced_parquet_index.rs Co-authored-by: Alex Huang <[email protected]> * Improve / clarify comments based on review * Add page index caveat --------- Co-authored-by: Alex Huang <[email protected]>
…he#10948) * Add Expr::column_refs to find column references without copying migrate some uses of to_column * Simplify condition
… duplicated custom implementations (apache#11059)
* Fix sink output schema being passed in to `FileSinkExec` where input schema was expected * Propagate CSV options (quote, double quote, and escape) through protos * Add test for double quotes * Test quote escape when double quotes are disabled * regen --------- Co-authored-by: svranesevic <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* Draft parse_sql * Allow stirng pass * Complete sql to expr support * Add examples * Add unit tests * Fix format * Remove async for trival operation and add parquet demo * Fix comments * fix comments * fix comments * Fix doc link
* Support dictionary data type in array_to_string * Fix import * Some tests * Update datafusion/functions-array/src/string.rs Co-authored-by: Alex Huang <[email protected]> * Add some tests showing incorrect results * Get logical array * apply rust fmt * Simplify implementation, avoid panics --------- Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* Implement min/max for interval types * Add sqllogictests for min/max intervals * Add tests for interval min/max * update sql logic tests --------- Co-authored-by: Andrew Lamb <[email protected]>
* add avg udaf * remove avg from expr * add test stub * migrate avg udaf * change avg udaf signature remove avg phy expr * fix tests * fix state_fields fn * fix ut in phy-plan aggr * refactor Average to Avg * refactor Average to Avg * fix type coercion tests * fix example and logic tests * fix py expr failing ut * update docs * fix failing tests * formatting examples * remove duplicate code and fix uts * addressing PR comments * add ut for logical avg window * fix physical plan roundtrip_window test case
* feat(11344): track memory used for non-parallel writes * feat(11344): track memory usage during parallel writes * test(11344): create bounded stream for testing * test(11344): test ParquetSink memory reservation * feat(11344): track bytes in file writer * refactor(11344): tweak the ordering to add col bytes to rg_reservation, before selecting shrinking for data bytes flushed * refactor: move each col_reservation and rg_reservation to match the parallelized call stack for col vs rg * test(11344): add memory_limit enforcement test for parquet sink * chore: cleanup to remove unnecessary reservation management steps * fix: fix CI test failure due to file extension rename
* Change no-statement error message to be clearer and add tests for said change * Run fmt to pass CI
apache#11299) * change array agg semantic for empty result Signed-off-by: jayzhan211 <[email protected]> * return null Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * fix order sensitive Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix null Signed-off-by: jayzhan211 <[email protected]> * fix multi-phase case Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix clone Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
…ments (apache#11391) * Minor: return "not supported" for COUNT DISTINCT with multiple arguments * update condition
* update tests * update tests * add rustdoc * update PartialEq impl * fix * address feedback about improving api
Amends apache#11394 (sorry, I should have reviewed that). While reporting "not implemented" for "multiple statements" seems reasonable, I think the user should get a plan error (which roughly translates to "invalid argument") if they don't provide any statement. I don't see any reasonable way to support "no statement" ever, hence "not implemented" seems like a wrong promise.
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <[email protected]>
* initial prettier unparse * bug fix * handling minus and divide * cleaning references and comments * moved tests * Update precedence of BETWEEN * rerun CI * Change precedence to match PGSQLs * more pretty unparser tests * Update operator precedence to match latest PGSQL * directly prettify expr_to_sql * handle IS operator * correct IS precedence * update unparser tests * update unparser example * update more unparser examples * add with_pretty builder to unparser
* chore: add document for `to_local_time()` * chore: feedback Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* move overlay to expr planner * typo
* Add customizable equality and hash functions to UDFs * Improve equals and hash_value documentation * Add tests for parameterized UDFs
* tmp * opt * modify test * add another version * implement make_map function * implement make_map function * implement map function * format and modify the doc * add benchmark for map function * add empty end-line * fix cargo check * update lock * upate lock * fix clippy * fmt and clippy * support FixedSizeList and LargeList * check type and handle null array in coerce_types * make array value throw todo error * fix clippy * simpify the error tests
…valuated stats (apache#11357) * Improve `CommonSubexprEliminate` rule with surely and conditionally evaluated stats * remove expression tree hashing as no longer needed * address review comments * add negative tests
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
github-actions
bot
added
sql
logical-expr
physical-expr
optimizer
core
substrait
sqllogictest
labels
Jul 15, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?