Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move acos() to function crate #9297

Merged
merged 1 commit into from
Feb 28, 2024

Conversation

SteveLauC
Copy link
Contributor

Which issue does this PR close?

Part of #9285

Rationale for this change

What changes are included in this PR?

Move math function acos() to the function crate

Are these changes tested?

Let's try this in CI:<

Are there any user-facing changes?

I guess no

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules labels Feb 21, 2024
@alamb alamb marked this pull request as draft February 21, 2024 07:46
@alamb
Copy link
Contributor

alamb commented Feb 21, 2024

Thanks @SteveLauC -- it appears there are some CI failuers

@SteveLauC
Copy link
Contributor Author

it appears there are some CI failuers

Indeed, I am working on the sqllogictest failure

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 21, 2024
Copy link
Contributor Author

@SteveLauC SteveLauC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review

DataType::Float32 => {
Arc::new(make_function_scalar_inputs_return_type!(
&args[0],
"x",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"x",
self.name(),

I forgot to change this one, BTW, should this be done? I saw that isnan() is using "x"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -15,15 +15,18 @@
// specific language governing permissions and limitations
// under the License.

//! "core" DataFusion functions
//! "math" DataFusion functions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a copy-paste error, so I changed it

@@ -15,7 +15,7 @@
// specific language governing permissions and limitations
// under the License.

//! Encoding expressions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, seems to be a copy-paste error

@@ -589,34 +586,6 @@ async fn roundtrip_parquet_exec_with_table_partition_cols() -> Result<()> {
roundtrip_test(Arc::new(ParquetExec::new(scan_config, None, None)))
}

#[test]
fn roundtrip_builtin_scalar_function() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this test because we are going to elimiate buit-in functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we have completed the migration, can you please simply change to use a different built in function so we don't lose coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -47,6 +47,7 @@ chrono = { workspace = true }
datafusion = { path = "../core", version = "36.0.0" }
datafusion-common = { workspace = true }
datafusion-expr = { workspace = true }
datafusion-functions = { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should remove this as it is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SteveLauC
Copy link
Contributor Author

Hi @alamb, I think I might need some help on the sqllogictest CI error, with the acos() function implemented in this PR:

  1. select(acos(NULL) would return a column of type ? rather than R
  2. select(acos(0), acos(1)) would return columns of type II rather than RR

Honestly, I have no idea about the above behavios...

@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Feb 21, 2024
@SteveLauC
Copy link
Contributor Author

SteveLauC commented Feb 21, 2024

Hi @alamb, I think I might need some help on the sqllogictest CI error, with the acos() function implemented in this PR:

  1. select(acos(NULL) would return a column of type ? rather than R
  2. select(acos(0), acos(1)) would return columns of type II rather than RR

Honestly, I have no idea about the above behavios...

Update: I just realized that it is a mistake that I made in <AsocFunc as ScalarUDFImpl>::return_type()


It feels that when we do select(acos(NULL)), AsocFunc::return_type() can see the passed DataType::Null, but AsocFunc::invoke() cannot because I didn't see the following panic message:

Unsupported data type DataType::Null for function asoc

It seems that DataFusion will handle those special values for you.

@SteveLauC SteveLauC marked this pull request as ready for review February 21, 2024 10:10
@@ -44,6 +44,7 @@ async-trait = { workspace = true }
chrono = { workspace = true }
datafusion-common = { workspace = true }
datafusion-expr = { workspace = true }
datafusion-functions = { workspace = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only used in testing? Then it should be placed under dev-dependencies.

Copy link
Contributor Author

@SteveLauC SteveLauC Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @Omega359 had a similar challenge when looking at timestamp functions. We are thinking it might be better to move the tests rather than add a new dependency -- see #9291 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can add this as a dependency as it prevents the crates from being published to crates.io. See #9277

I think we should move these tests out of the optimizer crate

For this PR, however, perhaps we could simply update this test to use a function that has not yet been ported and maybe move the tests in another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, however, perhaps we could simply update this test to use a function that has not yet been ported and maybe move the tests in another PR

done

@@ -545,7 +545,7 @@ message InListNode {

enum ScalarFunction {
Abs = 0;
Acos = 1;
// 1 was Acos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider keeping it, perhaps by renaming it to Deprecated_Acos, and then decode it into the new ScalarUDF to maintain better backward compatibility?

We might need to mark the BuiltinScalarFunction::Acos as deprecated as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hould we consider keeping it, perhaps by renaming it to Deprecated_Acos, and then decode it into the new ScalarUDF to maintain better backward compatibility?

I followed the way that how #9216 handed it, gentle ping on @alamb, maybe you have some thoughts on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay in review, I was out last week)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that what @jonahgao would allow better backwards compatibility

However, given that we don't really provide any guarantee for compatibility now anyways (I recently made this explicit in the documentation, https://github.com/apache/arrow-datafusion/blob/8f3d1ef23f93cd4303745eba76c0850b39774d07/datafusion/proto/src/lib.rs#L37-L41) I don't think we should add complexity for these functions

@jonahgao
Copy link
Member

jonahgao commented Feb 22, 2024

select acos() will panic, but I think it should be checked at a higher level, before the implementation of acos.

DataFusion CLI v36.0.0
❯ select acos();
thread 'main' panicked at datafusion/functions/src/math/acos.rs:57:25:
index out of bounds: the len is 0 but the index is 0

In the main branch, it just throws an error.

DataFusion CLI v36.0.0
❯ select acos();
Error during planning: No function matches the given name and argument types 'acos()'. You might need to add explicit type casts.
	Candidate functions:
	acos(Float64/Float32)

@SteveLauC
Copy link
Contributor Author

SteveLauC commented Feb 22, 2024

select acos() will panic, but I think it should be checked at a higher level, before the implementation of acos.

Huge thanks for catching it!

Just realized isnan() also has this issue, cc @alamb:

$ ./target/debug/datafusion-cli
DataFusion CLI v36.0.0
❯ select isnan();
thread 'main' panicked at /home/steve/Documents/workspace/GitHub/arrow-datafusion/datafusion/functions/src/math/nans.rs:67:39:
index out of bounds: the len is 0 but the index is 0

And cc @yyy1000, abs() also suffers from this.


but I think it should be checked at a higher level, before the implementation of acos.

Yeah, we should do it before invoking the function

@yyy1000
Copy link
Contributor

yyy1000 commented Feb 22, 2024

select acos() will panic, but I think it should be checked at a higher level, before the implementation of acos.

Huge thanks for catching it!

Just realized isnan() also has this issue, cc @alamb:

$ ./target/debug/datafusion-cli
DataFusion CLI v36.0.0
❯ select isnan();
thread 'main' panicked at /home/steve/Documents/workspace/GitHub/arrow-datafusion/datafusion/functions/src/math/nans.rs:67:39:
index out of bounds: the len is 0 but the index is 0

And cc @yyy1000, abs() also suffers from this.

but I think it should be checked at a higher level, before the implementation of acos.

Yeah, we should do it before invoking the function

Yeah, thanks for telling me @SteveLauC
I added the logic just now. :)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @SteveLauC and @jonahgao -- I apologize for the delay in reviewing this PR.

I think the new dependency between the optimizer and core crates needs to be fixed prior to merging this PR (I left suggestions on how to do so).

Otherwise, I think this PR is pretty close to ready to go other than some test tweaks

Thank you again so much

{ f32::acos }
))
},
other => return internal_err!("Unsupported data type {other:?} for function {}", self.name()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
other => return internal_err!("Unsupported data type {other:?} for function {}", self.name()),
other => return exec_err!("Unsupported data type {other:?} for function {}", self.name()),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -44,6 +44,7 @@ async-trait = { workspace = true }
chrono = { workspace = true }
datafusion-common = { workspace = true }
datafusion-expr = { workspace = true }
datafusion-functions = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can add this as a dependency as it prevents the crates from being published to crates.io. See #9277

I think we should move these tests out of the optimizer crate

For this PR, however, perhaps we could simply update this test to use a function that has not yet been ported and maybe move the tests in another PR

@@ -545,7 +545,7 @@ message InListNode {

enum ScalarFunction {
Abs = 0;
Acos = 1;
// 1 was Acos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that what @jonahgao would allow better backwards compatibility

However, given that we don't really provide any guarantee for compatibility now anyways (I recently made this explicit in the documentation, https://github.com/apache/arrow-datafusion/blob/8f3d1ef23f93cd4303745eba76c0850b39774d07/datafusion/proto/src/lib.rs#L37-L41) I don't think we should add complexity for these functions

@@ -589,34 +586,6 @@ async fn roundtrip_parquet_exec_with_table_partition_cols() -> Result<()> {
roundtrip_test(Arc::new(ParquetExec::new(scan_config, None, None)))
}

#[test]
fn roundtrip_builtin_scalar_function() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we have completed the migration, can you please simply change to use a different built in function so we don't lose coverage?

@SteveLauC SteveLauC force-pushed the refactor/move_acos_to_function_crate branch from bed7322 to e455453 Compare February 27, 2024 12:45
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @SteveLauC -- this looks great to me 🙏

@@ -889,14 +889,14 @@ mod test {
// test that automatic argument type coercion for scalar functions work
let empty = empty();
let lit_expr = lit(10i64);
let fun: BuiltinScalarFunction = BuiltinScalarFunction::Acos;
let fun: BuiltinScalarFunction = BuiltinScalarFunction::Floor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Feb 27, 2024

I took the liberty of merging up from main and resolving a conflict on this PR

@SteveLauC SteveLauC force-pushed the refactor/move_acos_to_function_crate branch from 29909d9 to b75f1cf Compare February 28, 2024 00:06
@alamb alamb merged commit 935ebca into apache:main Feb 28, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 28, 2024

Thanks again @SteveLauC

@SteveLauC SteveLauC deleted the refactor/move_acos_to_function_crate branch February 28, 2024 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants