Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve monotonicity api #10117

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datafusion-examples/examples/advanced_udf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ impl ScalarUDFImpl for PowUdf {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::Increasing))
}
}

Expand Down
61 changes: 54 additions & 7 deletions datafusion/expr/src/signature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -346,13 +346,60 @@ impl Signature {
}
}

/// Monotonicity of the `ScalarFunctionExpr` with respect to its arguments.
/// Each element of this vector corresponds to an argument and indicates whether
/// the function's behavior is monotonic, or non-monotonic/unknown for that argument, namely:
/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - Some(false) indicates that the function is monotonically decreasing w.r.t. the argument in question.
pub type FuncMonotonicity = Vec<Option<bool>>;
/// Monotonicity of a function with respect to its arguments.
///
/// A function is [monotonic] if it preserves the relative order of its inputs.
///
/// [monotonic]: https://en.wikipedia.org/wiki/Monotonic_function
#[derive(Debug, Clone)]
pub enum FuncMonotonicity {
/// not monotonic or unknown monotonicity
None,
/// Increasing with respect to all of its arguments
Increasing,
/// Decreasing with respect to all of its arguments
Decreasing,
/// Each element of this vector corresponds to an argument and indicates whether
/// the function's behavior is monotonic, or non-monotonic/unknown for that argument, namely:
/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - Some(false) indicates that the function is monotonically decreasing w.r.t. the argument in question.
Mixed(Vec<Option<bool>>),
}

impl PartialEq for FuncMonotonicity {
fn eq(&self, other: &Self) -> bool {
match (self, other) {
(FuncMonotonicity::None, FuncMonotonicity::None) => true,
(FuncMonotonicity::Increasing, FuncMonotonicity::Increasing) => true,
(FuncMonotonicity::Decreasing, FuncMonotonicity::Decreasing) => true,
(FuncMonotonicity::Mixed(vec1), FuncMonotonicity::Mixed(vec2)) => {
vec1 == vec2
}
_ => false,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These public API's don't check the size of the inner vector. Giving a larger index would panic the code. We can wrap it with a result here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs aren't being used anywhere. I think we can just remove them.

Copy link
Contributor Author

@tinfoil-knight tinfoil-knight May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
I've removed the unused arg_increasing & arg_decreasing public APIs.

For reference, the only place we're currently using FuncMonotonicity is this:

https://github.com/tinfoil-knight/arrow-datafusion/blob/53d9e30a7561c97492e47e3ee1679885b6c510e6/datafusion/physical-expr/src/scalar_function.rs#L250-L263

}
}

impl FuncMonotonicity {
pub fn matches(&self, other: &Self) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The matches method isn't currently being used anywhere in the codebase. It has been added to make comparisons easier for any future purposes.


Also, I don't think we need the arity of the function anymore for comparison b/w FuncMonotonicity::Mixed and other types.

For eg: FuncMonotonicity::Increasing is "Increasing with respect to all of its arguments" which is the same as the FuncMonotonicity::Mixed's inner vector being Some(true) for all elements from what I understand.

match (self, other) {
(FuncMonotonicity::None, FuncMonotonicity::Mixed(inner_vec))
| (FuncMonotonicity::Mixed(inner_vec), FuncMonotonicity::None) => {
inner_vec.iter().all(|&x| x.is_none())
}
(FuncMonotonicity::Increasing, FuncMonotonicity::Mixed(inner_vec))
| (FuncMonotonicity::Mixed(inner_vec), FuncMonotonicity::Increasing) => {
inner_vec.iter().all(|&x| x == Some(true))
}
(FuncMonotonicity::Decreasing, FuncMonotonicity::Mixed(inner_vec))
| (FuncMonotonicity::Mixed(inner_vec), FuncMonotonicity::Decreasing) => {
inner_vec.iter().all(|&x| x == Some(false))
}
_ => self == other,
}
}
}

#[cfg(test)]
mod tests {
Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/datetime/date_bin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ impl ScalarUDFImpl for DateBinFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![None, Some(true)]))
Ok(Some(FuncMonotonicity::Mixed(vec![None, Some(true)])))
}
}

Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/datetime/date_trunc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ impl ScalarUDFImpl for DateTruncFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![None, Some(true)]))
Ok(Some(FuncMonotonicity::Mixed(vec![None, Some(true)])))
}
}

Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/log.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ impl ScalarUDFImpl for LogFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true), Some(false)]))
Ok(Some(FuncMonotonicity::Mixed(vec![Some(true), Some(false)])))
}

// Support overloaded log(base, x) and log(x) which defaults to log(10, x)
Expand Down
76 changes: 65 additions & 11 deletions datafusion/functions/src/math/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,29 +38,83 @@ pub mod trunc;
// Create UDFs
make_udf_function!(abs::AbsFunc, ABS, abs);
make_math_unary_udf!(AcosFunc, ACOS, acos, acos, None);
make_math_unary_udf!(AcoshFunc, ACOSH, acosh, acosh, Some(vec![Some(true)]));
make_math_unary_udf!(
AcoshFunc,
ACOSH,
acosh,
acosh,
Some(FuncMonotonicity::Increasing)
);
make_math_unary_udf!(AsinFunc, ASIN, asin, asin, None);
make_math_unary_udf!(AsinhFunc, ASINH, asinh, asinh, Some(vec![Some(true)]));
make_math_unary_udf!(AtanFunc, ATAN, atan, atan, Some(vec![Some(true)]));
make_math_unary_udf!(AtanhFunc, ATANH, atanh, atanh, Some(vec![Some(true)]));
make_math_binary_udf!(Atan2, ATAN2, atan2, atan2, Some(vec![Some(true)]));
make_math_unary_udf!(
AsinhFunc,
ASINH,
asinh,
asinh,
Some(FuncMonotonicity::Increasing)
);
make_math_unary_udf!(
AtanFunc,
ATAN,
atan,
atan,
Some(FuncMonotonicity::Increasing)
);
make_math_unary_udf!(
AtanhFunc,
ATANH,
atanh,
atanh,
Some(FuncMonotonicity::Increasing)
);
make_math_binary_udf!(
Atan2,
ATAN2,
atan2,
atan2,
Some(FuncMonotonicity::Increasing)
);
make_math_unary_udf!(CbrtFunc, CBRT, cbrt, cbrt, None);
make_math_unary_udf!(CeilFunc, CEIL, ceil, ceil, Some(vec![Some(true)]));
make_math_unary_udf!(
CeilFunc,
CEIL,
ceil,
ceil,
Some(FuncMonotonicity::Increasing)
);
make_math_unary_udf!(CosFunc, COS, cos, cos, None);
make_math_unary_udf!(CoshFunc, COSH, cosh, cosh, None);
make_udf_function!(cot::CotFunc, COT, cot);
make_math_unary_udf!(DegreesFunc, DEGREES, degrees, to_degrees, None);
make_math_unary_udf!(ExpFunc, EXP, exp, exp, Some(vec![Some(true)]));
make_math_unary_udf!(ExpFunc, EXP, exp, exp, Some(FuncMonotonicity::Increasing));
make_udf_function!(factorial::FactorialFunc, FACTORIAL, factorial);
make_math_unary_udf!(FloorFunc, FLOOR, floor, floor, Some(vec![Some(true)]));
make_math_unary_udf!(
FloorFunc,
FLOOR,
floor,
floor,
Some(FuncMonotonicity::Increasing)
);
make_udf_function!(log::LogFunc, LOG, log);
make_udf_function!(gcd::GcdFunc, GCD, gcd);
make_udf_function!(nans::IsNanFunc, ISNAN, isnan);
make_udf_function!(iszero::IsZeroFunc, ISZERO, iszero);
make_udf_function!(lcm::LcmFunc, LCM, lcm);
make_math_unary_udf!(LnFunc, LN, ln, ln, Some(vec![Some(true)]));
make_math_unary_udf!(Log2Func, LOG2, log2, log2, Some(vec![Some(true)]));
make_math_unary_udf!(Log10Func, LOG10, log10, log10, Some(vec![Some(true)]));
make_math_unary_udf!(LnFunc, LN, ln, ln, Some(FuncMonotonicity::Increasing));
make_math_unary_udf!(
Log2Func,
LOG2,
log2,
log2,
Some(FuncMonotonicity::Increasing)
);
make_math_unary_udf!(
Log10Func,
LOG10,
log10,
log10,
Some(FuncMonotonicity::Increasing)
);
make_udf_function!(nanvl::NanvlFunc, NANVL, nanvl);
make_udf_function!(pi::PiFunc, PI, pi);
make_udf_function!(power::PowerFunc, POWER, power);
Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/pi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,6 @@ impl ScalarUDFImpl for PiFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::Increasing))
}
}
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/round.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ impl ScalarUDFImpl for RoundFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::Increasing))
}
}

Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/trunc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ impl ScalarUDFImpl for TruncFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::Increasing))
}
}

Expand Down
55 changes: 52 additions & 3 deletions datafusion/physical-expr/src/scalar_function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -251,10 +251,23 @@ pub fn out_ordering(
func: &FuncMonotonicity,
arg_orderings: &[SortProperties],
) -> SortProperties {
func.iter().zip(arg_orderings).fold(
arg_orderings.iter().enumerate().fold(
SortProperties::Singleton,
|prev_sort, (item, arg)| {
let current_sort = func_order_in_one_dimension(item, arg);
|prev_sort, (index, arg)| {
let arg_monotonicity: Option<bool> = match func {
FuncMonotonicity::None => None,
FuncMonotonicity::Increasing => Some(true),
FuncMonotonicity::Decreasing => Some(false),
FuncMonotonicity::Mixed(inner_vec) => {
Comment on lines +257 to +261
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier impl From<&FuncMonotonicity> for Vec<Option<bool>> was confusing & incorrect so it has been removed now.

I do think that the conversion from FuncMonotonicity to Option<bool> is required in this function.
I tried a few different ways (without conversion) but it was resulting in really messy code.

If the conversion still feels confusing, we can leave a note here mentioning this for clarity:

/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - `Some(false)` indicates that the function is monotonically decreasing w.r.t. the argument in question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the conversion feels confusing --

I do think that the conversion from FuncMonotonicity to Option is required in this function.
I tried a few different ways (without conversion) but it was resulting in really messy code.

I would be interested in trying to help here as I think the major benefit of this change is to encapsulate the monotonicity calculations. I'll give it a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add something like

enum Monotonicity {
  Increasing,
  Decreasing,
  Unknown,
}

And then represent FuncMonotonicity::Mixed with a Vec<Monotonicity>

I would love to get a chance to pla around with it, but may not have time for a while

if inner_vec.len() > index {
inner_vec[index]
} else {
None
}
}
};

let current_sort = func_order_in_one_dimension(&arg_monotonicity, arg);

match (prev_sort, current_sort) {
(_, SortProperties::Unordered) => SortProperties::Unordered,
Expand Down Expand Up @@ -299,3 +312,39 @@ fn func_order_in_one_dimension(
}
}
}

#[cfg(test)]
mod tests {
use arrow_schema::Schema;

use datafusion_common::{DFSchema, Result};
use datafusion_expr::{FuncMonotonicity, ScalarUDF};

use crate::utils::tests::TestScalarUDF;
use crate::ScalarFunctionExpr;

use super::create_physical_expr;

#[test]
fn test_function_expr() -> Result<()> {
let udf = ScalarUDF::from(TestScalarUDF::new());

let e = crate::expressions::lit(1.1);
let p_expr =
create_physical_expr(&udf, &[e], &Schema::empty(), &[], &DFSchema::empty())?;
let expr_monotonicity = p_expr
.as_any()
.downcast_ref::<ScalarFunctionExpr>()
.unwrap()
.monotonicity();

assert_eq!(expr_monotonicity, &Some(FuncMonotonicity::Increasing));

assert!(expr_monotonicity
.as_ref()
.unwrap()
.matches(&FuncMonotonicity::Mixed(vec![Some(true)])));

Ok(())
}
}
2 changes: 1 addition & 1 deletion datafusion/physical-expr/src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ pub(crate) mod tests {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::Increasing))
}

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
Expand Down