-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added isnone()
function
#801
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #801 +/- ##
==========================================
+ Coverage 87.44% 87.46% +0.01%
==========================================
Files 128 128
Lines 11369 11383 +14
Branches 1535 1538 +3
==========================================
+ Hits 9942 9956 +14
Misses 1048 1048
Partials 379 379
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
src/datachain/func/conditional.py
Outdated
|
||
Example: | ||
```py | ||
dc.mutate(test=isnone("value")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make a more reasonable example - filter, or some if condition ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that I need to support nested case
which is used by ifelse
and isnone
for this. I'm adding needed changes into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added better example
src/datachain/func/conditional.py
Outdated
Args: | ||
col (str | Column | literal): Column or literal to check if None. | ||
If a string is provided, it is assumed to be the name of the column. | ||
If a literal is provided, it is assumed to be a string literal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to specify literals separately? why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it. Technically someone can provide just string literal but that doesn't make much sense, except for unit tests. Columns are the one to be used here
src/datachain/func/conditional.py
Outdated
|
||
def isnone(col: Union[str, Column]) -> Func: | ||
""" | ||
Returns True if column value or literal is None, otherwise False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or literal
- what does it mean? could you give an example please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass an expression btw ? e.f. isnone(ifelse(...))? if not, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to pass expression. As I wrote in above response, I need to do some improvements to general case
function to be able to have nested ones for mentioned examples to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added ability to use nested functions inside case
or ifelse
(which is based on case
)
Deploying datachain-documentation with Cloudflare Pages
|
def case(*args: tuple[BinaryExpression, CaseT], else_=None) -> Func: | ||
def case( | ||
*args: tuple[Union[ColumnElement, Func], CaseT], else_: Optional[CaseT] = None | ||
) -> Func: | ||
""" | ||
Returns the case function that produces case expression which has a list of | ||
conditions and corresponding results. Results can only be python primitives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it still true? can result now be a Func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all methods here return Func
which is essentially a wrapper around sqlalchemy functions like case which produces case
construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this part Results can only be python primitives ...
, sorry
if_val: (str | int | float | complex | bool, Func): value for true | ||
condition outcome | ||
else_val: (str | int | float | complex | bool, Func): value for false condition | ||
outcome | ||
|
||
Returns: | ||
Func: A Func object that represents the ifelse function. | ||
|
||
Example: | ||
```py | ||
dc.mutate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add another example with a column expression?
also an examples with a results as an expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need that many examples? All other functions have exactly one example even though there can be more as well like the version with column expression as you stated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to my mind yes, examples is the most valuable part usually since you can get an idea of what is actually possible. We need more simple examples everywhere.
""" | ||
Returns True if column value is None, otherwise False | ||
Args: | ||
col (str | Column): Column to check if it's None or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out but not sure what is wrong in this part of docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.e. in some cases I saw it starts with lower case, sometimes I saw -
before the descriptions, sometimes there was no period at the end, sometimes it's a new line, etc
and all of this within this single PR
and it also applies to the description, e.g. we don't have period here - is it always the case?
src/datachain/func/conditional.py
Outdated
# if string, it is assumed to be the name of the column | ||
col = C(col) | ||
|
||
return case((col == None, True), else_=False) # type: ignore [arg-type] # noqa: E711 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need # type: ignore [arg-type] # noqa: E711
?
will users who will use case
also have this issue with types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg-type
is not needed indeed... probably was needed in some previous iterations and stopped being needed after some refactoring so it's a leftover.
Regarding E711
... ruff
is complaining about comparing with None
with ==
and says it should be is None
, but since this is SQL construct it should actually be like this (I've tried with is None
but doesn't work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an analog of our isnone
for this case? I wonder how should people use it raises an obvious warning ...
tests/unit/test_func.py
Outdated
|
||
|
||
@pytest.mark.parametrize("col", ["val", C("val")]) | ||
@skip_if_not_sqlite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that we can't even run the operation itself on CH - it won't work at all? or will it run, just returning always False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will double check but from what I remember it always return False
so wrong result, which is even worse than failing IMO. Dmitry created task in Studio to allow nullable columns in CH so this should not be a problem in the future.
I'm not 100% sure if I should merge this PR now though, as I don't like to leave it in review for so long due conflicts etc, but on the other hand it doesn't really work for CH...so I guess it will idle here until we allow nullable in CH after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to merge ... it always returns False for now - and I would test for it (so that function can run). We just don't support nullables there at all ...
@@ -423,7 +430,7 @@ def get_db_col_type(signals_schema: "SignalSchema", col: ColT) -> "DataType": | |||
return sql_to_python(col) | |||
|
|||
return signals_schema.get_column_type( | |||
col.name if isinstance(col, ColumnElement) else col | |||
col.name if isinstance(col, ColumnElement) else col # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have this ignore here?
else_ (str | int | float | complex | bool): else value in case expression | ||
args (tuple((ColumnElement, Func), (str | int | float | complex | bool, Func))): | ||
Tuple of condition and values pair. | ||
else_ (str | int | float | complex | bool, Func): else value in case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it optional? can we say optional else value
. What happens by default?
(no in case expression
since it's clear from the context)
and values for true and false outcome. Results can only be python primitives | ||
like string, numbes or booleans. Result type is inferred from the values. | ||
and values for true and false outcome. Results can be one of python primitives | ||
like string, numbes or booleans, but can also be nested functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numbes -> numbers
from datachain.sql.functions import conditional | ||
|
||
from .func import ColT, Func | ||
|
||
CaseT = Union[int, float, complex, bool, str] | ||
CaseT = Union[int, float, complex, bool, str, Func] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there tests for Func values btw? can we add them?
isnone(...)
function returnsTrue
if column or literal value isNone
/NULL
, otherwiseFalse
Note that this doesn't currently work for Clickhouse where we don't have nullable columns. We should decide if we want to introduce nullable columns and if yes, this will work there as well.
UPDATE:
In addition, this PR extends
case()
function and related codebase to accept nested functions in:else_
valueThis will allow usage of functions inside other functions that are based on case like
ifelse
orisnone
as well. We also will have ability to do nested cases etc.