Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added isnone() function #801

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Added isnone() function #801

wants to merge 11 commits into from

Conversation

ilongin
Copy link
Contributor

@ilongin ilongin commented Jan 7, 2025

isnone(...) function returns True if column or literal value is None / NULL, otherwise False

Note that this doesn't currently work for Clickhouse where we don't have nullable columns. We should decide if we want to introduce nullable columns and if yes, this will work there as well.

UPDATE:
In addition, this PR extends case() function and related codebase to accept nested functions in:

  1. Case condition - value (both places)
  2. Case else_ value

This will allow usage of functions inside other functions that are based on case like ifelse or isnone as well. We also will have ability to do nested cases etc.

@ilongin ilongin linked an issue Jan 7, 2025 that may be closed by this pull request
@ilongin ilongin marked this pull request as draft January 7, 2025 14:31
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.46%. Comparing base (2364ac2) to head (5644531).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #801      +/-   ##
==========================================
+ Coverage   87.44%   87.46%   +0.01%     
==========================================
  Files         128      128              
  Lines       11369    11383      +14     
  Branches     1535     1538       +3     
==========================================
+ Hits         9942     9956      +14     
  Misses       1048     1048              
  Partials      379      379              
Flag Coverage Δ
datachain 87.40% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


Example:
```py
dc.mutate(test=isnone("value"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make a more reasonable example - filter, or some if condition ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that I need to support nested case which is used by ifelse and isnone for this. I'm adding needed changes into this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added better example

Args:
col (str | Column | literal): Column or literal to check if None.
If a string is provided, it is assumed to be the name of the column.
If a literal is provided, it is assumed to be a string literal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to specify literals separately? why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it. Technically someone can provide just string literal but that doesn't make much sense, except for unit tests. Columns are the one to be used here


def isnone(col: Union[str, Column]) -> Func:
"""
Returns True if column value or literal is None, otherwise False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or literal - what does it mean? could you give an example please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass an expression btw ? e.f. isnone(ifelse(...))? if not, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to pass expression. As I wrote in above response, I need to do some improvements to general case function to be able to have nested ones for mentioned examples to work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ability to use nested functions inside case or ifelse (which is based on case)

Copy link

cloudflare-workers-and-pages bot commented Jan 9, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: c25e359
Status: ✅  Deploy successful!
Preview URL: https://14e74cac.datachain-documentation.pages.dev
Branch Preview URL: https://ilongin-800-isnone-function.datachain-documentation.pages.dev

View logs

@ilongin ilongin marked this pull request as ready for review January 10, 2025 11:15
def case(*args: tuple[BinaryExpression, CaseT], else_=None) -> Func:
def case(
*args: tuple[Union[ColumnElement, Func], CaseT], else_: Optional[CaseT] = None
) -> Func:
"""
Returns the case function that produces case expression which has a list of
conditions and corresponding results. Results can only be python primitives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still true? can result now be a Func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all methods here return Func which is essentially a wrapper around sqlalchemy functions like case which produces case construct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this part Results can only be python primitives ..., sorry

if_val: (str | int | float | complex | bool, Func): value for true
condition outcome
else_val: (str | int | float | complex | bool, Func): value for false condition
outcome

Returns:
Func: A Func object that represents the ifelse function.

Example:
```py
dc.mutate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add another example with a column expression?

also an examples with a results as an expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that many examples? All other functions have exactly one example even though there can be more as well like the version with column expression as you stated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to my mind yes, examples is the most valuable part usually since you can get an idea of what is actually possible. We need more simple examples everywhere.

"""
Returns True if column value is None, otherwise False
Args:
col (str | Column): Column to check if it's None or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out but not sure what is wrong in this part of docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.e. in some cases I saw it starts with lower case, sometimes I saw - before the descriptions, sometimes there was no period at the end, sometimes it's a new line, etc

and all of this within this single PR

and it also applies to the description, e.g. we don't have period here - is it always the case?

# if string, it is assumed to be the name of the column
col = C(col)

return case((col == None, True), else_=False) # type: ignore [arg-type] # noqa: E711
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need # type: ignore [arg-type] # noqa: E711?

will users who will use case also have this issue with types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg-type is not needed indeed... probably was needed in some previous iterations and stopped being needed after some refactoring so it's a leftover.
Regarding E711 ... ruff is complaining about comparing with None with == and says it should be is None, but since this is SQL construct it should actually be like this (I've tried with is None but doesn't work)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an analog of our isnone for this case? I wonder how should people use it raises an obvious warning ...



@pytest.mark.parametrize("col", ["val", C("val")])
@skip_if_not_sqlite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that we can't even run the operation itself on CH - it won't work at all? or will it run, just returning always False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will double check but from what I remember it always return False so wrong result, which is even worse than failing IMO. Dmitry created task in Studio to allow nullable columns in CH so this should not be a problem in the future.
I'm not 100% sure if I should merge this PR now though, as I don't like to leave it in review for so long due conflicts etc, but on the other hand it doesn't really work for CH...so I guess it will idle here until we allow nullable in CH after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to merge ... it always returns False for now - and I would test for it (so that function can run). We just don't support nullables there at all ...

@@ -423,7 +430,7 @@ def get_db_col_type(signals_schema: "SignalSchema", col: ColT) -> "DataType":
return sql_to_python(col)

return signals_schema.get_column_type(
col.name if isinstance(col, ColumnElement) else col
col.name if isinstance(col, ColumnElement) else col # type: ignore[arg-type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this ignore here?

else_ (str | int | float | complex | bool): else value in case expression
args (tuple((ColumnElement, Func), (str | int | float | complex | bool, Func))):
Tuple of condition and values pair.
else_ (str | int | float | complex | bool, Func): else value in case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it optional? can we say optional else value. What happens by default?

(no in case expression since it's clear from the context)

and values for true and false outcome. Results can only be python primitives
like string, numbes or booleans. Result type is inferred from the values.
and values for true and false outcome. Results can be one of python primitives
like string, numbes or booleans, but can also be nested functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numbes -> numbers

from datachain.sql.functions import conditional

from .func import ColT, Func

CaseT = Union[int, float, complex, bool, str]
CaseT = Union[int, float, complex, bool, str, Func]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there tests for Func values btw? can we add them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add isnone() function
2 participants