Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve monotonicity api #10117

Closed
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datafusion-examples/examples/advanced_udf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ impl ScalarUDFImpl for PowUdf {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::new_increasing()))
}
}

Expand Down
82 changes: 75 additions & 7 deletions datafusion/expr/src/signature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -346,13 +346,81 @@ impl Signature {
}
}

/// Monotonicity of the `ScalarFunctionExpr` with respect to its arguments.
/// Each element of this vector corresponds to an argument and indicates whether
/// the function's behavior is monotonic, or non-monotonic/unknown for that argument, namely:
/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - Some(false) indicates that the function is monotonically decreasing w.r.t. the argument in question.
pub type FuncMonotonicity = Vec<Option<bool>>;
#[derive(Debug, Clone)]
enum FuncMonotonicityPartial {
/// not monotonic or unknown monotonicity
None,
/// Increasing with respect to all of its arguments
Increasing,
/// Decreasing with respect to all of its arguments
Decreasing,
/// Each element of this vector corresponds to an argument and indicates whether
/// the function's behavior is monotonic, or non-monotonic/unknown for that argument, namely:
/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - Some(false) indicates that the function is monotonically decreasing w.r.t. the argument in question.
Mixed(Vec<Option<bool>>),
}

/// Monotonicity of a function with respect to its arguments.
///
/// A function is [monotonic] if it preserves the relative order of its inputs.
///
/// [monotonic]: https://en.wikipedia.org/wiki/Monotonic_function
#[derive(Debug, Clone)]
pub struct FuncMonotonicity(FuncMonotonicityPartial);

impl FuncMonotonicity {
pub fn new_none() -> Self {
Self(FuncMonotonicityPartial::None)
}
pub fn new_increasing() -> Self {
Self(FuncMonotonicityPartial::Increasing)
}
pub fn new_decreasing() -> Self {
Self(FuncMonotonicityPartial::Decreasing)
}
pub fn new_mixed(inner: Vec<Option<bool>>) -> Self {
Self(FuncMonotonicityPartial::Mixed(inner))
}

/// returns true if this function is monotonically increasing with respect to argument number arg
pub fn arg_increasing(&self, arg: usize) -> bool {
match &self.0 {
FuncMonotonicityPartial::None => false,
FuncMonotonicityPartial::Increasing => true,
FuncMonotonicityPartial::Decreasing => false,
FuncMonotonicityPartial::Mixed(inner) => inner[arg].unwrap_or(false),
}
}

/// returns true if this function is monotonically decreasing with respect to argument number arg
pub fn arg_decreasing(&self, arg: usize) -> bool {
match &self.0 {
FuncMonotonicityPartial::None => false,
FuncMonotonicityPartial::Increasing => false,
FuncMonotonicityPartial::Decreasing => true,
FuncMonotonicityPartial::Mixed(inner) => inner[arg].unwrap_or(false),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These public API's don't check the size of the inner vector. Giving a larger index would panic the code. We can wrap it with a result here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs aren't being used anywhere. I think we can just remove them.

Copy link
Contributor Author

@tinfoil-knight tinfoil-knight May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
I've removed the unused arg_increasing & arg_decreasing public APIs.

For reference, the only place we're currently using FuncMonotonicity is this:

https://github.com/tinfoil-knight/arrow-datafusion/blob/53d9e30a7561c97492e47e3ee1679885b6c510e6/datafusion/physical-expr/src/scalar_function.rs#L250-L263

}
}

impl From<&FuncMonotonicity> for Vec<Option<bool>> {
fn from(val: &FuncMonotonicity) -> Self {
match &val.0 {
FuncMonotonicityPartial::None => vec![None],
FuncMonotonicityPartial::Increasing => vec![Some(true)],
FuncMonotonicityPartial::Decreasing => vec![Some(false)],
FuncMonotonicityPartial::Mixed(inner) => inner.to_vec(),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this conversion is possible to do. Given only Increasing, you can not produce the vector without knowing the arity of the function. Assuming an arity of one (which seems to be the case here) can lead to subtle bugs and/or compile-time errors in some cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ozankabak it is confusing to have a way to convert back to Vec<Option<bool>> -- I think it would be easier to understand if all comparisons are done directly on FuncMonotonicity rather than converting to Vec<Option>


impl PartialEq for FuncMonotonicity {
fn eq(&self, other: &Self) -> bool {
Into::<Vec<Option<bool>>>::into(self) == Into::<Vec<Option<bool>>>::into(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my comment on From, you will need explicit logic here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this perhaps:

match (self, other)  {
  (Self::None, None) => true,
  (Self::Increasing, Self::Increasing) => true,
  (Self::Decreasing, Self::Decreasing) => true,
  (Self::Partial(self_inner), Self::Partial(other_inner)) => self_inner == other_inner,
  _ => false
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We also need to do configs like (Self::Partial(inner), Self::Increasing) but this is the idea 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that type of comparsion belongs in something other than equal -- perhaps

impl FuncMonotonicity {
  fn matches(&self, &other) -> bool { 
    ...
  }
}

So we can separate the equality check from the both sides match 🤔

}
}

#[cfg(test)]
mod tests {
Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/datetime/date_bin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ impl ScalarUDFImpl for DateBinFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![None, Some(true)]))
Ok(Some(FuncMonotonicity::new_mixed(vec![None, Some(true)])))
}
}

Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/datetime/date_trunc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ impl ScalarUDFImpl for DateTruncFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![None, Some(true)]))
Ok(Some(FuncMonotonicity::new_mixed(vec![None, Some(true)])))
}
}

Expand Down
5 changes: 4 additions & 1 deletion datafusion/functions/src/math/log.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,10 @@ impl ScalarUDFImpl for LogFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true), Some(false)]))
Ok(Some(FuncMonotonicity::new_mixed(vec![
Some(true),
Some(false),
])))
}

// Support overloaded log(base, x) and log(x) which defaults to log(10, x)
Expand Down
82 changes: 71 additions & 11 deletions datafusion/functions/src/math/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,29 +38,89 @@ pub mod trunc;
// Create UDFs
make_udf_function!(abs::AbsFunc, ABS, abs);
make_math_unary_udf!(AcosFunc, ACOS, acos, acos, None);
make_math_unary_udf!(AcoshFunc, ACOSH, acosh, acosh, Some(vec![Some(true)]));
make_math_unary_udf!(
AcoshFunc,
ACOSH,
acosh,
acosh,
Some(FuncMonotonicity::new_increasing())
);
make_math_unary_udf!(AsinFunc, ASIN, asin, asin, None);
make_math_unary_udf!(AsinhFunc, ASINH, asinh, asinh, Some(vec![Some(true)]));
make_math_unary_udf!(AtanFunc, ATAN, atan, atan, Some(vec![Some(true)]));
make_math_unary_udf!(AtanhFunc, ATANH, atanh, atanh, Some(vec![Some(true)]));
make_math_binary_udf!(Atan2, ATAN2, atan2, atan2, Some(vec![Some(true)]));
make_math_unary_udf!(
AsinhFunc,
ASINH,
asinh,
asinh,
Some(FuncMonotonicity::new_increasing())
);
make_math_unary_udf!(
AtanFunc,
ATAN,
atan,
atan,
Some(FuncMonotonicity::new_increasing())
);
make_math_unary_udf!(
AtanhFunc,
ATANH,
atanh,
atanh,
Some(FuncMonotonicity::new_increasing())
);
make_math_binary_udf!(
Atan2,
ATAN2,
atan2,
atan2,
Some(FuncMonotonicity::new_increasing())
);
make_math_unary_udf!(CbrtFunc, CBRT, cbrt, cbrt, None);
make_math_unary_udf!(CeilFunc, CEIL, ceil, ceil, Some(vec![Some(true)]));
make_math_unary_udf!(
CeilFunc,
CEIL,
ceil,
ceil,
Some(FuncMonotonicity::new_increasing())
);
make_math_unary_udf!(CosFunc, COS, cos, cos, None);
make_math_unary_udf!(CoshFunc, COSH, cosh, cosh, None);
make_udf_function!(cot::CotFunc, COT, cot);
make_math_unary_udf!(DegreesFunc, DEGREES, degrees, to_degrees, None);
make_math_unary_udf!(ExpFunc, EXP, exp, exp, Some(vec![Some(true)]));
make_math_unary_udf!(
ExpFunc,
EXP,
exp,
exp,
Some(FuncMonotonicity::new_increasing())
);
make_udf_function!(factorial::FactorialFunc, FACTORIAL, factorial);
make_math_unary_udf!(FloorFunc, FLOOR, floor, floor, Some(vec![Some(true)]));
make_math_unary_udf!(
FloorFunc,
FLOOR,
floor,
floor,
Some(FuncMonotonicity::new_increasing())
);
make_udf_function!(log::LogFunc, LOG, log);
make_udf_function!(gcd::GcdFunc, GCD, gcd);
make_udf_function!(nans::IsNanFunc, ISNAN, isnan);
make_udf_function!(iszero::IsZeroFunc, ISZERO, iszero);
make_udf_function!(lcm::LcmFunc, LCM, lcm);
make_math_unary_udf!(LnFunc, LN, ln, ln, Some(vec![Some(true)]));
make_math_unary_udf!(Log2Func, LOG2, log2, log2, Some(vec![Some(true)]));
make_math_unary_udf!(Log10Func, LOG10, log10, log10, Some(vec![Some(true)]));
make_math_unary_udf!(LnFunc, LN, ln, ln, Some(FuncMonotonicity::new_increasing()));
make_math_unary_udf!(
Log2Func,
LOG2,
log2,
log2,
Some(FuncMonotonicity::new_increasing())
);
make_math_unary_udf!(
Log10Func,
LOG10,
log10,
log10,
Some(FuncMonotonicity::new_increasing())
);
make_udf_function!(nanvl::NanvlFunc, NANVL, nanvl);
make_udf_function!(pi::PiFunc, PI, pi);
make_udf_function!(power::PowerFunc, POWER, power);
Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/pi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,6 @@ impl ScalarUDFImpl for PiFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::new_increasing()))
}
}
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/round.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ impl ScalarUDFImpl for RoundFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::new_increasing()))
}
}

Expand Down
2 changes: 1 addition & 1 deletion datafusion/functions/src/math/trunc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ impl ScalarUDFImpl for TruncFunc {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::new_increasing()))
}
}

Expand Down
3 changes: 2 additions & 1 deletion datafusion/physical-expr/src/functions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,8 @@ pub fn out_ordering(
func: &FuncMonotonicity,
arg_orderings: &[SortProperties],
) -> SortProperties {
func.iter().zip(arg_orderings).fold(
let monotonicity_vec: Vec<Option<bool>> = func.into();
monotonicity_vec.iter().zip(arg_orderings).fold(
SortProperties::Singleton,
|prev_sort, (item, arg)| {
let current_sort = func_order_in_one_dimension(item, arg);
Expand Down
4 changes: 2 additions & 2 deletions datafusion/physical-expr/src/udf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ mod tests {
use arrow_schema::Schema;

use datafusion_common::{DFSchema, Result};
use datafusion_expr::ScalarUDF;
use datafusion_expr::{FuncMonotonicity, ScalarUDF};

use crate::utils::tests::TestScalarUDF;
use crate::ScalarFunctionExpr;
Expand All @@ -87,7 +87,7 @@ mod tests {
.downcast_ref::<ScalarFunctionExpr>()
.unwrap()
.monotonicity(),
&Some(vec![Some(true)])
&Some(FuncMonotonicity::new_increasing())
);

Ok(())
Expand Down
2 changes: 1 addition & 1 deletion datafusion/physical-expr/src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ pub(crate) mod tests {
}

fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
Ok(Some(vec![Some(true)]))
Ok(Some(FuncMonotonicity::new_increasing()))
}

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
Expand Down