Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Series.all?/1 and Series.any?/1 #754

Merged
merged 3 commits into from
Dec 3, 2023

Conversation

costaraphael
Copy link
Contributor

This PR introduces two new aggregations: Series.all?/1 and Series.any?/1.

The main thing to keep in mind with these functions is how to deal with nil. I could see three approaches:

  • Ignore nil values (all?([nil, nil, nil]) == all?([]))
  • Use Kleene Logic, treating nil as unknown, and return nil whenever it makes the result also unknown (any?([nil, true]) == true and all?([nil, false]) == false, but any?([nil, false]) == nil and all?([nil, true]) == nil)
    • this is how the existing Series.and/2 and Series.or/2 functions behave
  • Treat nil as false just like in the Enum functions (all?([nil, true]) == false)

I went with the first option here as it seems to be the default behavior in Polars, but I can see the appeal for either of them. The one thing to consider about the third option above is that it doesn't have native support in Polars, so it would need to be implemented on the Explorer side (should be straightforward with a call to fill_missing(false) before aggregating).

Another route I can see is supporting all behaviors, and letting the caller decide through an option:

all?(s, nils: :as_false | :ignore | :kleene)

Let me know what your thoughts are!

@billylanchantin
Copy link
Member

@costaraphael Thanks this is looking good!

RE the approaches: I think consistency with or and and is the right call. Having them behave differently would be very surprising, especially if you ever tried to combine their outputs.

Also, I think we want to avoid as much as possible doing work on the Elixir side. It's significantly more costly.

@costaraphael
Copy link
Contributor Author

Also, I think we want to avoid as much as possible doing work on the Elixir side. It's significantly more costly.

100%! When I said "on the Explorer side", I meant in the Rust code. Something like:

#[rustler::nif(schedule = "DirtyCpu")]
pub fn s_all(s: ExSeries) -> Result<bool, ExplorerError> {
    let s = s.clone_inner();

    Ok(s.bool()?.fill_null_with_values(false)?.all())
}

@costaraphael
Copy link
Contributor Author

I think consistency with or and and is the right call. Having them behave differently would be very surprising, especially if you ever tried to combine their outputs.

Yeah, I can definitely see that. This is also how PostgreSQL deals with NULLs.

Do you think it makes sense to support all the modes through an option, and default to :kleene?

@billylanchantin
Copy link
Member

billylanchantin commented Dec 2, 2023

Do you think it makes sense to support all the modes through an option, and default to :kleene?

Ah I'm just looking at the Polars implementations now:

They both provide an ignore_nulls option where true is your :ignore option and false is your :kleene option. I'm now thinking we just mirror their approach. If you want to do it with nils: :ignore | :kleene I'm not against it. (IMHO named options > booleans.) But you'll definitely need at least a sentence to explain what :kleene is. For reference Polars says this for any:

If ignore_nulls is False, Kleene logic is used to deal with nulls: if the column contains any null values and no true values, the output is null.

That also implies not supporting your as_false option. The usual approach is to handle nil with an explicit imputation. So I think that if someone wants to treat nil as false, they should do the fill_missing call themselves.

@costaraphael
Copy link
Contributor Author

They both provide an ignore_nulls option where true is your :ignore option and false is your :kleene option.

That's true for the expressions, but the two different approaches are behind different functions in the eager case, which is why I figured it wouldn't be that much more trouble to support a third option.

But I see your point, a client can always do all?(fill_missing(s, false)) to achieve the same effect, and it allows to reduce the API surface area.

My only concern is that the nil treatment here might catch some Elixir devs off guard by deviating from the Enum behavior, but making sure this difference is emphasized in the docs could be enough.

I'll push a commit later with the changes!

lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/series.ex Outdated Show resolved Hide resolved
Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and it is a reasonable default. We can add the proposed options in the future if requested. :)

costaraphael and others added 2 commits December 3, 2023 09:35
@josevalim josevalim merged commit f4303a1 into elixir-explorer:main Dec 3, 2023
4 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants