-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the to_timestamp* functions to datafusion-functions #9388
Conversation
# Conflicts: # datafusion/physical-expr/src/functions.rs # datafusion/proto/proto/datafusion.proto
@@ -79,6 +86,43 @@ fn schema() -> DFSchema { | |||
.unwrap() | |||
} | |||
|
|||
fn test_table_scan() -> LogicalPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were migrated from the optimizer/simplify_expressions file
]; | ||
b.iter(|| { | ||
for i in inputs.iter() { | ||
black_box(to_timestamp(vec![i.clone()])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks had to change to test the public api vs the previous version that tested the internal api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think testing the public API is actually a more real-world comparison (so this seems like an improvement to me)
mod to_timestamp; | ||
|
||
// create UDFs | ||
make_udf_function!(to_timestamp::ToTimestampFunc, TO_TIMESTAMP, to_timestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because to_timestamp functions are nary functions the export_functions! macro could not be used. Instead this is the manual equivalent of what it expands out to.
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
# Conflicts: # datafusion/functions/Cargo.toml # datafusion/functions/src/lib.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359 -- this is looking pretty sweet
"to_timestamp", | ||
vec![lit("2020-09-08T12:00:00+00:00")], | ||
)?); | ||
let expr = col("ts").eq(to_timestamp(vec![lit("2020-09-08T12:00:00+00:00")])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
const ERR_NANOSECONDS_NOT_SUPPORTED: &str = "The dates that can be represented as nanoseconds have to be between 1677-09-21T00:12:44.0 and 2262-04-11T23:47:16.854775804"; | ||
|
||
#[derive(Debug)] | ||
pub(super) struct ToTimestampFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all so similar, I wonder if we could make a common class to avoid the duplication something like
struct ToTimestampImpl {
/// target datatype
data_type: DataType,
}
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ScalarUDFImpl is different for each 'version' of that struct. It would be possible to have a single struct but then the impl would have to match on the data_type (really in this case I think it might be timeunit). I'll do a test of that tonight to see if it's cleaner overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is required, just a suggestion I thought of while reviewing this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I just looked into it and it's actually more than the impl methods because it would affect the make_udf_function macro as well (or a clone) to allow args to be passed into the ::new() call. Possible for a future improvement
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
use datafusion_expr::ColumnarValue; | ||
|
||
use crate::expressions::cast_column; | ||
|
||
/// Error message if nanosecond conversion request beyond supported interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great to extract all this code out of datafusion/physical-expr
@@ -483,7 +483,7 @@ statement error Did you mean 'arrow_typeof'? | |||
SELECT arrowtypeof(v1) from test; | |||
|
|||
# Scalar function | |||
statement error Did you mean 'to_timestamp_seconds'? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 this looks like there may be special case handling for function hinting that doesn't work for user defined functions.
I'll file a ticket to add that feature to user defined functions too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #9392
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #9407
Co-authored-by: Andrew Lamb <[email protected]>
# Conflicts: # datafusion/functions/src/lib.rs
…timizer to core/tests.
test_evaluate( | ||
(lit("foo").not_eq(lit("foo"))).or(col("c").eq(lit(1))), | ||
// lit(false).or(col("c").eq(lit(1))), | ||
col("c").eq(lit(1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was the version that was in expr_simplifier but that no longer passed the test once the test_evaluate_with_start_time function was changed to use ExprSimplifier from ConstEvaluator. I am unsure if this is in fact correct or not @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks better to me (false OR ...) is always (...).
The reason that the test changed is that previously it was only testing the constant propagation, but now it is also testing the simplifier code as well. I think this looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Omega359 -- this looks just about perfect. There are a few tests I want to revert but I will handle doing so and push changes to this PR
test_evaluate( | ||
(lit("foo").not_eq(lit("foo"))).or(col("c").eq(lit(1))), | ||
// lit(false).or(col("c").eq(lit(1))), | ||
col("c").eq(lit(1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks better to me (false OR ...) is always (...).
The reason that the test changed is that previously it was only testing the constant propagation, but now it is also testing the simplifier code as well. I think this looks good
]; | ||
b.iter(|| { | ||
for i in inputs.iter() { | ||
black_box(to_timestamp(vec![i.clone()])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think testing the public API is actually a more real-world comparison (so this seems like an improvement to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am merging up from main to resolve conflicts in this PR now
I tried with this for a bit but ran out of time. @Omega359 I think the issue is that the Is this something you have time to do? If not, I will find time to do it myself. |
I can handle it, thanks for all your help. Likely tomorrow sometime. |
BTW here is a proposed PR (targeting this PR) to avoid the new dependency: Omega359#1 |
Move `cast_column` to `ColumnarValue::cast_to`
# Conflicts: # datafusion/expr/src/expr_fn.rs # datafusion/functions/Cargo.toml # datafusion/physical-expr/src/datetime_expressions.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @Omega359 -- this is epic work
datafusion/functions/Cargo.toml
Outdated
log = "0.4.20" | ||
rand = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think several of these dependencies are only used in the tests (so they could be dev depenencies). I'll make a follow on PR clean it up as this one is already really large.
Update I pushed a commit to move then to dev dependencies
I think the only thing left here is to resolve the merge conflicts (which is due to |
…side to_timestamp.
# Conflicts: # datafusion/functions/Cargo.toml
@@ -0,0 +1,389 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one | |||
// or more contributor license agreements. See the NOTICE file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved functions common to to_timestamp and to_date to here
pub mod expr_fn { | ||
use datafusion_expr::Expr; | ||
|
||
/// ```ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunately but this rustdoc won't work as is because of the SessionContext::new() call is into a non-accessible module. I think we either need to pull examples out of rustdoc or find some other solution that doesn't involve circular dependencies. This is one of the more limiting aspects with rust documentation I've encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we probably need to pull them out of the rust doc directly (maybe we can put them in the examples 🤔 ) and maybe link to them from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @Omega359 -- thanks again. Let's merge and iterate in follow on PRs
Which issue does this PR close?
Closes #9291
Rationale for this change
#9285
What changes are included in this PR?
code, tests
Are these changes tested?
Yes
Are there any user-facing changes?
Functions moved to new crate but are enabled by default.