feat: Adding ewm_mean #1298

DeaMariaLeon · 2024-11-01T14:36:58Z

What type of PR is this? (check all applicable)

Related issues

Related issue #
Closes #

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

I didn't add anything for Arrow because I'm waiting to see the feedback for #1290

DeaMariaLeon · 2024-11-01T14:42:56Z

The javascript is to add Latex:

MarcoGorelli · 2024-11-01T15:22:02Z

nice! thanks for doing this

initial comment: from what I remember, pandas and Polars might have handled adjust differently - could you add a test which parametrises over adjust being True and False? (or maybe i don't remember right and they're already aligned, either way, would be good to verify)

DeaMariaLeon · 2024-11-02T17:58:18Z

Added one test for adjust.

I haven't added anything for Arrow because I'm not sure if we want to add it or not.

MarcoGorelli

thanks for updating! i think this is close

MarcoGorelli · 2024-11-02T18:32:33Z

tests/expr_and_series/ewm_test.py

+from tests.utils import ConstructorEager
+from tests.utils import assert_equal_data
+
+data = {"a": [1, 1, 2], "b": [1, 2, 3]}


can we include a test with nulls too please?

I added one at the bottom.

MarcoGorelli · 2024-11-02T18:33:26Z

tests/expr_and_series/ewm_test.py

+    adjust: bool,  # noqa: FBT001
+) -> None:
+    if "pyarrow_" in str(constructor) or "dask" in str(constructor):  # remove
+        pytest.skip()


we can use request.applymarker(pytest.mark.xfail) please? then the test actually runs and we check that it fails, as opposed to being skipped (also, if i remember correctly pytest.skip had some undesirable behaviour)

Thanks a lot for the explanation.

MarcoGorelli · 2024-11-03T19:03:02Z

narwhals/series.py

+            pandas and Polars handle nulls differently. So, calculating ewm over
+            a sequence with null values leads to distinct results:


I think it's that Polars preserves null values, whereas pandas forward-fills

Can we preserve null values for pandas too?

The parameters for both polars and pandas are the same, I don't see how to do what you are asking, sorry.
Or do you mean that Narwhals should handle that both behave the same way?
In that case, for example Exp.fill_null returns different values for each library.

I don't think you need to fill the nulls, but just preserve them - so, if a value was null to start with, it should be null in the result too

we do something like that in timestamp

Doing like you do in timestamp solves it for None.. but with a series like [1.0, float("nan"), 4.0] we still have this:

Pandas:

0 1.0 1 NaN 2 3.4 dtype: float64

Polars:

shape: (3,) Series: '' [f64] [ 1.0 NaN NaN ]

sure, cause Polars treat 'nan' differently from null - but if we use the null value for both, does the result match? e.g. [1., None, 4.]?

for Polars 'nan' is only the result of illegal mathematical operations (like 0/0) so it's far rarer to encounter it there

regarding older versions ci - i'd suggest making a separate virtual environment and installing the versions which show up in the show deps step of the ci job

Older versions of polars give similar results to pandas when there is a null (None).

So with an input of: {"a": [2.0, 4.0, None, 3.0]} Then:

Expected: {'a': [2.0, 3.3333333333333335, nan, 3.142857142857143]} Got: {'a': [2.0, 3.3333333333333335, 3.3333333333333335, 3.142857142857143]}

For the moment I'm "xfailing" that test with older versions. I'm not sure if that's correct.

thanks - is it possible to use pl.when to preserve the null values for old versions of Polars?

either that, or raise NotImplementedError for old versions of Polars for now, and let's create an issue to track preserving null values in old Polars versions

Thanks for the review.. I added the "raise" for now. I'll do the follow up PR.

MarcoGorelli · 2024-11-11T18:04:10Z

tests/expr_and_series/ewm_test.py

+    df = nw.from_native(constructor({"a": [2.0, 4.0, None, 3.0]}))
+    result = df.select(nw.col("a").ewm_mean(com=1, ignore_nulls=ignore_nulls))
+
+    if ignore_nulls:


i'd suggest to include the list as something you parametrise over, rather than inclusiding logic (if/then) in the test. in general, we should use if/then in tests only when necessary, it's something i try to avoid if possible (and sometimes it's not possible unfortunately)

Changed that.

MarcoGorelli · 2024-11-11T18:06:00Z

narwhals/_pandas_like/series.py

+        com: float | None = None,
+        span: float | None = None,
+        half_life: float | None = None,
+        alpha: float | None = None,
+        adjust: bool = True,
+        min_periods: int = 1,
+        ignore_nulls: bool = False,


there's a lot of parameters here, do we have a test which hits each of them?

I'll add tests for the parameters then... ~~(I can't hit all the parameters in only one test, at least not the first 4 I think)..~~ Is that what you meant?

MarcoGorelli · 2024-11-11T18:06:38Z

tests/expr_and_series/ewm_test.py

+    if adjust:
+        expected = {
+            "a": [1.0, 1.0, 1.5714285714285714],
+            "b": [1.0, 1.6666666666666667, 2.4285714285714284],
+        }
+    else:
+        expected = {
+            "a": [1.0, 1.0, 1.5],
+            "b": [1.0, 1.5, 2.25],
+        }


DeaMariaLeon · 2024-11-12T14:31:58Z

I added tests for the parameters.

DeaMariaLeon · 2024-11-18T15:59:38Z

Added the NarwhalsUnstableWarning 😇

MarcoGorelli

awesome, thanks @DeaMariaLeon !

I just made some minor edits based on #1401

I think

calculating ewm over a sequence with null values leads to distinct results

isn't quite exact, because the result is the same if we consider that pandas' null value is 'nan' and Polars' null values is None

The difference is just that Polars (and PyArrow, and I think all other libraries) treat 'nan' as just another floating point number (https://en.wikipedia.org/wiki/IEEE_754), and it's generally rare to enounter 'nan' in those libraries

If we initialise a Series with [None, 3.5, float('nan')], then pandas treats it as [null, 3.5, null], whereas for other libraries it's [null, 3.5, nan] - but it's quite rare to initialise a Series from a list like this with both None and 'nan', you'd make a Series from some data source (e.g. a file) and then each library would encode missing values according to its own definition of missing values

Sorry if this explanation is too long or pedantic 😄

DeaMariaLeon added 7 commits October 30, 2024 18:41

wip

fc64937

wip

3d1e466

latex works

ac0c3f7

doc test series

14dd1c5

expr docstring

a4b5bd7

added to pandaslikeexpr

686e33c

added wip test

9113a9d

DeaMariaLeon changed the title ~~wip: wwm~~ wip: Adding ewm_mean Nov 1, 2024

DeaMariaLeon changed the title ~~wip: Adding ewm_mean~~ feat: Adding ewm_mean Nov 1, 2024

github-actions bot added the enhancement New feature or request label Nov 1, 2024

DeaMariaLeon added 2 commits November 1, 2024 16:17

wip

212b78a

Merge remote-tracking branch 'upstream/main' into ewm

1bf1571

DeaMariaLeon added 2 commits November 2, 2024 17:12

after merge

cd986f0

added dask not implemented error test

e5b9486

MarcoGorelli reviewed Nov 2, 2024

View reviewed changes

DeaMariaLeon added 4 commits November 3, 2024 15:13

added test with nulls

1dfab2c

example with nulls

6f738cd

Merge remote-tracking branch 'upstream/main' into ewm

0cdb0c3

fixed mkdocs issue

73cc573

MarcoGorelli reviewed Nov 3, 2024

View reviewed changes

DeaMariaLeon added 7 commits November 6, 2024 10:10

Match polars' None in input

5cd4833

Merge remote-tracking branch 'upstream/main' into ewm

130322e

polars version

19da3a3

polars version again

6cc0a96

again

afb3ed3

wip

ee2e916

add modin to xfail

7f872cf

Merge remote-tracking branch 'upstream/main' into ewm

6fdaa29

DeaMariaLeon requested a review from MarcoGorelli November 8, 2024 07:41

DeaMariaLeon added 3 commits November 10, 2024 20:32

ewm_mean not implemented yet for old Polars

3cbfe53

after conflict

6368b04

remove unused test

a6c4525

MarcoGorelli reviewed Nov 11, 2024

View reviewed changes

DeaMariaLeon added 5 commits November 12, 2024 11:18

parametrize expected

a67aef0

after merge conflict

f8d438a

Merge remote-tracking branch 'upstream/main' into ewm

0c34a55

test parameters

eddceb4

Merge remote-tracking branch 'upstream/main' into ewm

fdeb4dc

MarcoGorelli mentioned this pull request Nov 13, 2024

api: "unstable" features #1367

Closed

DeaMariaLeon added 2 commits November 18, 2024 15:13

Merge remote-tracking branch 'upstream/main' into ewm

8787b65

added warning

3f2a26d

MarcoGorelli added 2 commits November 19, 2024 09:57

remove nan example, narwhals-dev#1401

e8eb645

use None in test

bbe2cae

MarcoGorelli approved these changes Nov 19, 2024

View reviewed changes

MarcoGorelli merged commit d7c1d4f into narwhals-dev:main Nov 19, 2024
22 checks passed

DeaMariaLeon deleted the ewm branch November 19, 2024 11:22

DeaMariaLeon mentioned this pull request Nov 19, 2024

[Enh]: Preserve null values in old Polars versions for ewm_mean #1405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adding ewm_mean #1298

feat: Adding ewm_mean #1298

DeaMariaLeon commented Nov 1, 2024 •

edited

Loading

DeaMariaLeon commented Nov 1, 2024

MarcoGorelli commented Nov 1, 2024

DeaMariaLeon commented Nov 2, 2024

MarcoGorelli left a comment

MarcoGorelli Nov 2, 2024

DeaMariaLeon Nov 3, 2024

MarcoGorelli Nov 2, 2024

DeaMariaLeon Nov 3, 2024

MarcoGorelli Nov 3, 2024

DeaMariaLeon Nov 4, 2024

MarcoGorelli Nov 5, 2024

DeaMariaLeon Nov 5, 2024

MarcoGorelli Nov 5, 2024

MarcoGorelli Nov 6, 2024

DeaMariaLeon Nov 6, 2024

MarcoGorelli Nov 8, 2024

DeaMariaLeon Nov 10, 2024

MarcoGorelli Nov 11, 2024

MarcoGorelli Nov 11, 2024

DeaMariaLeon Nov 12, 2024

MarcoGorelli Nov 11, 2024

DeaMariaLeon Nov 12, 2024 •

edited

Loading

MarcoGorelli Nov 11, 2024

DeaMariaLeon commented Nov 12, 2024

DeaMariaLeon commented Nov 18, 2024

MarcoGorelli left a comment

		pandas and Polars handle nulls differently. So, calculating ewm over
		a sequence with null values leads to distinct results:

feat: Adding ewm_mean #1298

feat: Adding ewm_mean #1298

Conversation

DeaMariaLeon commented Nov 1, 2024 • edited Loading

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

DeaMariaLeon commented Nov 1, 2024

MarcoGorelli commented Nov 1, 2024

DeaMariaLeon commented Nov 2, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DeaMariaLeon Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DeaMariaLeon commented Nov 12, 2024

DeaMariaLeon commented Nov 18, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

DeaMariaLeon commented Nov 1, 2024 •

edited

Loading

DeaMariaLeon Nov 12, 2024 •

edited

Loading