-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Improve documentation for AggregateUDFImpl::accumulator and AccumulatorArgs
#9920
Changes from 3 commits
67be22a
315a9a4
203c83f
0f59565
56199fc
f3dc5db
b3add2d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,7 +20,7 @@ | |
use crate::ColumnarValue; | ||
use crate::{Accumulator, Expr, PartitionEvaluator}; | ||
use arrow::datatypes::{DataType, Schema}; | ||
use datafusion_common::Result; | ||
use datafusion_common::{not_impl_err, Result}; | ||
use std::sync::Arc; | ||
|
||
/// Scalar function | ||
|
@@ -38,18 +38,39 @@ pub type ScalarFunctionImplementation = | |
pub type ReturnTypeFunction = | ||
Arc<dyn Fn(&[DataType]) -> Result<Arc<DataType>> + Send + Sync>; | ||
|
||
/// Arguments passed to create an accumulator | ||
/// [`AccumulatorArgs`] contains information about how an aggregate | ||
/// function was called, including the types of its arguments and any optional | ||
/// ordering expressions. | ||
pub struct AccumulatorArgs<'a> { | ||
// default arguments | ||
/// the return type of the function | ||
/// The return type of the aggregate function. | ||
pub data_type: &'a DataType, | ||
/// the schema of the input arguments | ||
/// The schema of the input arguments | ||
pub schema: &'a Schema, | ||
/// whether to ignore nulls | ||
/// Whether to ignore nulls. | ||
/// | ||
/// SQL allows the user to specify `IGNORE NULLS`, for example: | ||
/// | ||
/// ```sql | ||
/// SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t; | ||
/// ``` | ||
/// | ||
/// Aggregates that do not support this functionality should return a not | ||
/// implemented error when `ignore_nulls` is true. | ||
pub ignore_nulls: bool, | ||
|
||
// ordering arguments | ||
/// the expressions of `order by`, if no ordering is required, this will be an empty slice | ||
/// The expressions in the `ORDER BY` clause passed to this aggregator. | ||
/// | ||
/// SQL allows the user to specify the ordering of arguments to the | ||
/// aggregate using an `ORDER BY`. For example: | ||
/// | ||
/// ```sql | ||
/// SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t; | ||
/// ``` | ||
/// | ||
/// If no `ORDER BY` is specified, `sort_exprs`` will be empty. Aggregates | ||
/// that do not support this functionality may return a not implemented | ||
/// error when the slice is non empty, as ordering the arguments is an | ||
/// expensive operation and is wasteful if the aggregate doesn't support it. | ||
pub sort_exprs: &'a [Expr], | ||
} | ||
|
||
|
@@ -67,6 +88,24 @@ impl<'a> AccumulatorArgs<'a> { | |
sort_exprs, | ||
} | ||
} | ||
|
||
/// Return a not yet implemented error if IGNORE NULLs is true | ||
pub fn check_ignore_nulls(&self, name: &str) -> Result<()> { | ||
if self.ignore_nulls { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we check I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is confusing for the user to understand whether they need to check or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is also a problem because if they forget to add To enforce they specified the options for their functions, I think we can add the checking function in Either // We will check if the AccumulatorArgs meet the requirement or not.
fn options() -> AccumulatorArgs {
AccumulatorArgs {
ignore_nulls: true/ false,
has_ordering: true / false
}
} or // The same with this one, but separate each option
fn support_ignore_nulls() -> bool
fn support_ordering() -> bool There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think we can do this, because the ignore_nulls is true for the following queries SELECT avg(x) FROM ...;
SELECT avg(x) RESPECT NULLS FROM ...; In other words, it is the default even when the user doesn't explicitly specify the handling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this suggestion -- are you suggesting rename There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is a good point --- in fact it actually affects built in aggregates too today ❯ select count(*) from (values (1), (null), (2));
+----------+
| COUNT(*) |
+----------+
| 3 |
+----------+
1 row in set. Query took 0.039 seconds.
❯ select count(*) IGNORE NULLS from (values (1), (null), (2));
+----------+
| COUNT(*) |
+----------+
| 3 |
+----------+
1 row in set. Query took 0.001 seconds. I think this is a sepate issue, and not made worse by this PR -- I filed #9924 to track. I suggest we work on improving it as a follow on PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes, that is what I suggest, so we know exactly whether it is disabled or not. But I think the comment here also helps, rename is not neccessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But I think if we implement these for UDFImpl, fn support_ignore_nulls() -> bool
fn support_ordering() -> bool we probably don't need the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good idea and I think it should be done in #9924 I will update this PR to remove the |
||
not_impl_err!("IGNORE NULLS not implemented for {name}") | ||
} else { | ||
Ok(()) | ||
} | ||
} | ||
|
||
/// Return a not yet implemented error if `ORDER BY` is non empty | ||
pub fn check_order_by(&self, name: &str) -> Result<()> { | ||
if !self.sort_exprs.is_empty() { | ||
not_impl_err!("ORDER BY not implemented for {name}") | ||
} else { | ||
Ok(()) | ||
} | ||
} | ||
} | ||
|
||
/// Factory that returns an accumulator for the given aggregate function. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by cleanups