Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve monotonicity api #10117

Closed

Conversation

tinfoil-knight
Copy link
Contributor

Which issue does this PR close?

Closes #9879 .

Rationale for this change

The Vec<Option<bool>> type used to express the monotonicity of scalar functions wrt their arguments was really difficult to understand.
Note: Currently only mathematical and date-time functions are considered for monotonicity.

What changes are included in this PR?

FuncMonotonicity has been changed from the type Vec<Option<bool>> to an enum.

pub enum FuncMonotonicity {
    None,
    Increasing,
    Decreasing,
    Mixed(Vec<Option<bool>>),
}

Are these changes tested?

By existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Apr 17, 2024
@ozankabak
Copy link
Contributor

ozankabak commented Apr 17, 2024

The problem I see with this representation is that Monotonicity::Increasing and Monotonicity::Mixed(vec![Some(true), ..., Some(true)]) refers to the same actual configuration. If one writes a check like given_monotonicity == Monotonicity::Increasing (and I'm sure people will), one will miss the alternative representation (which may easily arise when one generates this information programmatically if no special treatment is done).

I think we can find a refactor that increases readability but doesn't introduce such traps and preserves uniqueness. Maybe a struct with an inner Vec<Option<bool>> and methods that define an API structuring how one interacts with it.

@alamb

@tinfoil-knight

This comment was marked as outdated.

@alamb
Copy link
Contributor

alamb commented Apr 19, 2024

The problem I see with this representation is that Monotonicity::Increasing and Monotonicity::Mixed(vec![Some(true), ..., Some(true)]) refers to the same actual configuration. If one writes a check like given_monotonicity == Monotonicity::Increasing (and I'm sure people will), one will miss the alternative representation (which may easily arise when one generates this information programmatically if no special treatment is done).

I think this is an excellent point

I think we can find a refactor that increases readability but doesn't introduce such traps and preserves uniqueness.

Another potential solution that would avoid allocating Vec so much would be to make the enum private and only allow construction through APIs

Maybe something like

// Non pub to ensure can only construct valid representations
pub enum FuncMonotonicityInner {
    None,
    Increasing,
    Decreasing,
    Mixed(Vec<Option<bool>>),
}

/// Describes function monotonicity
pub struct FuncMonotonicity(FuncMonotonicityInner);

impl FuncMontonicity {
  pub fn new_none() -> Self {..}
  pub fn new_increasing() -> Self { .. }
  pub fn new_decreasing() -> Self { .. }
  pub fn new_mixed(inner: Vec<bool>) -> { .. }

  /// returns true if this function is monotonically increasing with respect to argument number arg
  pub fn arg_increasing(&self, arg: usize) -> bool { .. }
  /// returns true if this function is monotonically decreasing with respect to argument number arg
  pub fn arg_decreasing(&self, arg: usize) -> bool { .. }
}

In my opinion the real value of a struct over a typedef is that it reduces the cognative load of remembering "what does monotonicity[0] == Some(false)" mean -- instead it can be monotonicty.arg_decreasing(0) which also has documentation explaining it in case you forget.

@ozankabak
Copy link
Contributor

I like your new proposal. In this one, new_mixed would check if inner consists entirely of true/false values, and return FuncMonotonicityInner::Increasing/FuncMonotonicityInner::Decreasing in those cases.

Instead of FuncMonotonicityInner, I recommend FuncMonotonicityPartial to be more in line with math verbiage.

@alamb
Copy link
Contributor

alamb commented Apr 21, 2024

@tinfoil-knight -- are you interested in completing this PR with the proposed API updates?

@tinfoil-knight
Copy link
Contributor Author

@alamb Yeah. I just have some college exams this week so might get delayed a bit but I'll pick this up as soon as possible.

@alamb
Copy link
Contributor

alamb commented Apr 21, 2024

@alamb Yeah. I just have some college exams this week so might get delayed a bit but I'll pick this up as soon as possible.

No worries -- good luck with the exams!

@alamb alamb marked this pull request as draft April 21, 2024 11:24
@alamb
Copy link
Contributor

alamb commented Apr 21, 2024

Converting to draft as it is not waiting on feedback

@tinfoil-knight tinfoil-knight marked this pull request as ready for review May 5, 2024 20:31
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - Some(false) indicates that the function is monotonically decreasing w.r.t. the argument in question.
pub type FuncMonotonicity = Vec<Option<bool>>;
#[derive(Debug, Clone, PartialEq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tinfoil-knight. I can just say this PartialEq derivation could be dangerous here. One can expect to see the equality of Increasing and vec![Some(true)].

Copy link
Contributor Author

@tinfoil-knight tinfoil-knight May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Makes sense. I've fixed this in d47b896 (#10117)

FuncMonotonicityPartial::Increasing => false,
FuncMonotonicityPartial::Decreasing => true,
FuncMonotonicityPartial::Mixed(inner) => inner[arg].unwrap_or(false),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These public API's don't check the size of the inner vector. Giving a larger index would panic the code. We can wrap it with a result here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs aren't being used anywhere. I think we can just remove them.

Copy link
Contributor Author

@tinfoil-knight tinfoil-knight May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
I've removed the unused arg_increasing & arg_decreasing public APIs.

For reference, the only place we're currently using FuncMonotonicity is this:

https://github.com/tinfoil-knight/arrow-datafusion/blob/53d9e30a7561c97492e47e3ee1679885b6c510e6/datafusion/physical-expr/src/scalar_function.rs#L250-L263

@tinfoil-knight tinfoil-knight force-pushed the 9879-monotonicity-api branch from 53d9e30 to 8379e7c Compare May 7, 2024 09:14
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left my comments for things to fix/improve. Thanks @tinfoil-knight

Comment on lines 388 to 397
impl From<&FuncMonotonicity> for Vec<Option<bool>> {
fn from(val: &FuncMonotonicity) -> Self {
match &val.0 {
FuncMonotonicityPartial::None => vec![None],
FuncMonotonicityPartial::Increasing => vec![Some(true)],
FuncMonotonicityPartial::Decreasing => vec![Some(false)],
FuncMonotonicityPartial::Mixed(inner) => inner.to_vec(),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this conversion is possible to do. Given only Increasing, you can not produce the vector without knowing the arity of the function. Assuming an arity of one (which seems to be the case here) can lead to subtle bugs and/or compile-time errors in some cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ozankabak it is confusing to have a way to convert back to Vec<Option<bool>> -- I think it would be easier to understand if all comparisons are done directly on FuncMonotonicity rather than converting to Vec<Option>


impl PartialEq for FuncMonotonicity {
fn eq(&self, other: &Self) -> bool {
Into::<Vec<Option<bool>>>::into(self) == Into::<Vec<Option<bool>>>::into(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my comment on From, you will need explicit logic here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this perhaps:

match (self, other)  {
  (Self::None, None) => true,
  (Self::Increasing, Self::Increasing) => true,
  (Self::Decreasing, Self::Decreasing) => true,
  (Self::Partial(self_inner), Self::Partial(other_inner)) => self_inner == other_inner,
  _ => false
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We also need to do configs like (Self::Partial(inner), Self::Increasing) but this is the idea 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that type of comparsion belongs in something other than equal -- perhaps

impl FuncMonotonicity {
  fn matches(&self, &other) -> bool { 
    ...
  }
}

So we can separate the equality check from the both sides match 🤔

Comment on lines 254 to 256
let monotonicity_vec: Vec<Option<bool>> = func.into();

monotonicity_vec.iter().zip(arg_orderings).fold(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic will unfortunately get a little complicated given the issue with From.

Comment on lines 349 to 386
#[derive(Debug, Clone)]
enum FuncMonotonicityPartial {
/// not monotonic or unknown monotonicity
None,
/// Increasing with respect to all of its arguments
Increasing,
/// Decreasing with respect to all of its arguments
Decreasing,
/// Each element of this vector corresponds to an argument and indicates whether
/// the function's behavior is monotonic, or non-monotonic/unknown for that argument, namely:
/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - Some(false) indicates that the function is monotonically decreasing w.r.t. the argument in question.
Mixed(Vec<Option<bool>>),
}

/// Monotonicity of a function with respect to its arguments.
///
/// A function is [monotonic] if it preserves the relative order of its inputs.
///
/// [monotonic]: https://en.wikipedia.org/wiki/Monotonic_function
#[derive(Debug, Clone)]
pub struct FuncMonotonicity(FuncMonotonicityPartial);

impl FuncMonotonicity {
pub fn new_none() -> Self {
Self(FuncMonotonicityPartial::None)
}
pub fn new_increasing() -> Self {
Self(FuncMonotonicityPartial::Increasing)
}
pub fn new_decreasing() -> Self {
Self(FuncMonotonicityPartial::Decreasing)
}
pub fn new_mixed(inner: Vec<Option<bool>>) -> Self {
Self(FuncMonotonicityPartial::Mixed(inner))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use just one data type? Consider:

/// Monotonicity of a function with respect to its arguments.
///
/// A function is [monotonic] if it preserves the relative order of its inputs.
///
/// [monotonic]: https://en.wikipedia.org/wiki/Monotonic_function
#[derive(Debug, Clone)]
pub enum FuncMonotonicity {
    /// Not monotonic or unknown monotonicity.
    None,
    /// Increasing with respect to all of its arguments.
    Increasing,
    /// Decreasing with respect to all of its arguments.
    Decreasing,
    /// Each element of the vector corresponds to an argument and indicates whether
    /// the function's behavior is monotonic or non-monotonic/unknown for that argument, namely:
    /// - `None` signifies unknown monotonicity or non-monotonicity.
    /// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
    /// - `Some(false)` indicates that the function is monotonically decreasing w.r.t. the argument in question.
    Partial(Vec<Option<bool>>),
}

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tinfoil-knight and @ozankabak -- the PR is making progress (and it is a good example of iterative API design). Thank you for sticking with it

Comment on lines 388 to 397
impl From<&FuncMonotonicity> for Vec<Option<bool>> {
fn from(val: &FuncMonotonicity) -> Self {
match &val.0 {
FuncMonotonicityPartial::None => vec![None],
FuncMonotonicityPartial::Increasing => vec![Some(true)],
FuncMonotonicityPartial::Decreasing => vec![Some(false)],
FuncMonotonicityPartial::Mixed(inner) => inner.to_vec(),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ozankabak it is confusing to have a way to convert back to Vec<Option<bool>> -- I think it would be easier to understand if all comparisons are done directly on FuncMonotonicity rather than converting to Vec<Option>


impl PartialEq for FuncMonotonicity {
fn eq(&self, other: &Self) -> bool {
Into::<Vec<Option<bool>>>::into(self) == Into::<Vec<Option<bool>>>::into(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this perhaps:

match (self, other)  {
  (Self::None, None) => true,
  (Self::Increasing, Self::Increasing) => true,
  (Self::Decreasing, Self::Decreasing) => true,
  (Self::Partial(self_inner), Self::Partial(other_inner)) => self_inner == other_inner,
  _ => false
}

@tinfoil-knight tinfoil-knight marked this pull request as draft May 9, 2024 12:13
Copy link
Contributor Author

@tinfoil-knight tinfoil-knight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments to clarify things.

Comment on lines +257 to +261
let arg_monotonicity: Option<bool> = match func {
FuncMonotonicity::None => None,
FuncMonotonicity::Increasing => Some(true),
FuncMonotonicity::Decreasing => Some(false),
FuncMonotonicity::Mixed(inner_vec) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier impl From<&FuncMonotonicity> for Vec<Option<bool>> was confusing & incorrect so it has been removed now.

I do think that the conversion from FuncMonotonicity to Option<bool> is required in this function.
I tried a few different ways (without conversion) but it was resulting in really messy code.

If the conversion still feels confusing, we can leave a note here mentioning this for clarity:

/// - `None` signifies unknown monotonicity or non-monotonicity.
/// - `Some(true)` indicates that the function is monotonically increasing w.r.t. the argument in question.
/// - `Some(false)` indicates that the function is monotonically decreasing w.r.t. the argument in question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the conversion feels confusing --

I do think that the conversion from FuncMonotonicity to Option is required in this function.
I tried a few different ways (without conversion) but it was resulting in really messy code.

I would be interested in trying to help here as I think the major benefit of this change is to encapsulate the monotonicity calculations. I'll give it a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add something like

enum Monotonicity {
  Increasing,
  Decreasing,
  Unknown,
}

And then represent FuncMonotonicity::Mixed with a Vec<Monotonicity>

I would love to get a chance to pla around with it, but may not have time for a while

}

impl FuncMonotonicity {
pub fn matches(&self, other: &Self) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The matches method isn't currently being used anywhere in the codebase. It has been added to make comparisons easier for any future purposes.


Also, I don't think we need the arity of the function anymore for comparison b/w FuncMonotonicity::Mixed and other types.

For eg: FuncMonotonicity::Increasing is "Increasing with respect to all of its arguments" which is the same as the FuncMonotonicity::Mixed's inner vector being Some(true) for all elements from what I understand.

@tinfoil-knight tinfoil-knight marked this pull request as ready for review May 9, 2024 20:33
@tinfoil-knight tinfoil-knight requested a review from alamb May 9, 2024 20:34
@alamb
Copy link
Contributor

alamb commented May 10, 2024

Thanks @tinfoil-knight -- I will check this out later today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +257 to +261
let arg_monotonicity: Option<bool> = match func {
FuncMonotonicity::None => None,
FuncMonotonicity::Increasing => Some(true),
FuncMonotonicity::Decreasing => Some(false),
FuncMonotonicity::Mixed(inner_vec) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the conversion feels confusing --

I do think that the conversion from FuncMonotonicity to Option is required in this function.
I tried a few different ways (without conversion) but it was resulting in really messy code.

I would be interested in trying to help here as I think the major benefit of this change is to encapsulate the monotonicity calculations. I'll give it a look

Comment on lines +257 to +261
let arg_monotonicity: Option<bool> = match func {
FuncMonotonicity::None => None,
FuncMonotonicity::Increasing => Some(true),
FuncMonotonicity::Decreasing => Some(false),
FuncMonotonicity::Mixed(inner_vec) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add something like

enum Monotonicity {
  Increasing,
  Decreasing,
  Unknown,
}

And then represent FuncMonotonicity::Mixed with a Vec<Monotonicity>

I would love to get a chance to pla around with it, but may not have time for a while

@alamb
Copy link
Contributor

alamb commented May 21, 2024

I believe this PR has been superceded by #10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR

Thanks for the work on this @tinfoil-knight , and sorry we went another way.

BTW @berkaysynnada in the future if you are working on a PR that might make another become obselete it might be nice to drop a note on the other PR to given the author(s) a heads up.

@alamb alamb closed this May 21, 2024
@berkaysynnada
Copy link
Contributor

I believe this PR has been superceded by #10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR

Thanks for the work on this @tinfoil-knight , and sorry we went another way.

BTW @berkaysynnada in the future if you are working on a PR that might make another become obselete it might be nice to drop a note on the other PR to given the author(s) a heads up.

Sorry for the inconvenience, @tinfoil-knight. My main motivation for #10504 was not specifically related to the monotonicity API, but I had to make changes in those areas to achieve a better overall design. I apologize for not informing you earlier.

@tinfoil-knight tinfoil-knight deleted the 9879-monotonicity-api branch May 21, 2024 10:54
@tinfoil-knight
Copy link
Contributor Author

No worries @berkaysynnada and @alamb. It's difficult to keep track of everyone's work in such a large project.

It was fun to work on this and I learnt a few things from everyone's review.

@alamb
Copy link
Contributor

alamb commented May 22, 2024

Thanks everyone for the communication. I just want everyone working on this project to have a good experience -- I am glad to hear that it was. Thanks again @tinfoil-knight and @berkaysynnada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: Improve Monotoniciy API
4 participants