-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding ewm_mean #1298
feat: Adding ewm_mean #1298
Changes from 15 commits
fc64937
3d1e466
ac0c3f7
14dd1c5
a4b5bd7
686e33c
9113a9d
212b78a
1bf1571
cd986f0
e5b9486
1dfab2c
6f738cd
0cdb0c3
73cc573
5cd4833
130322e
19da3a3
6cc0a96
afb3ed3
ee2e916
7f872cf
6fdaa29
3cbfe53
6368b04
a6c4525
a67aef0
f8d438a
0c34a55
eddceb4
fdeb4dc
8787b65
3f2a26d
e8eb645
bbe2cae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ | |
- cum_sum | ||
- diff | ||
- drop_nulls | ||
- ewm_mean | ||
- fill_null | ||
- filter | ||
- gather_every | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
- diff | ||
- drop_nulls | ||
- dtype | ||
- ewm_mean | ||
- fill_null | ||
- filter | ||
- gather_every | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
document$.subscribe(({ body }) => { | ||
renderMathInElement(body, { | ||
delimiters: [ | ||
{ left: "$$", right: "$$", display: true }, | ||
{ left: "$", right: "$", display: false }, | ||
{ left: "\\(", right: "\\)", display: false }, | ||
{ left: "\\[", right: "\\]", display: true } | ||
], | ||
}) | ||
}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -383,6 +383,127 @@ def name(self) -> str: | |
""" | ||
return self._compliant_series.name # type: ignore[no-any-return] | ||
|
||
def ewm_mean( | ||
self: Self, | ||
*, | ||
com: float | None = None, | ||
span: float | None = None, | ||
half_life: float | None = None, | ||
alpha: float | None = None, | ||
adjust: bool = True, | ||
min_periods: int = 1, | ||
ignore_nulls: bool = False, | ||
) -> Self: | ||
r""" | ||
Compute exponentially-weighted moving average. | ||
|
||
Arguments: | ||
com: Specify decay in terms of center of mass, $\gamma$, with <br> $\alpha = \frac{1}{1+\gamma}\forall\gamma\geq0$ | ||
span: Specify decay in terms of span, $\theta$, with <br> $\alpha = \frac{2}{\theta + 1} \forall \theta \geq 1$ | ||
half_life: Specify decay in terms of half-life, $\tau$, with <br> $\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \tau } \right\} \forall \tau > 0$ | ||
alpha: Specify smoothing factor alpha directly, $0 < \alpha \leq 1$. | ||
adjust: Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings | ||
|
||
- When `adjust=True` (the default) the EW function is calculated | ||
using weights $w_i = (1 - \alpha)^i$ | ||
- When `adjust=False` the EW function is calculated recursively by | ||
$$ | ||
y_0=x_0 | ||
$$ | ||
$$ | ||
y_t = (1 - \alpha)y_{t - 1} + \alpha x_t | ||
$$ | ||
min_periods: Minimum number of observations in window required to have a value (otherwise result is null). | ||
ignore_nulls: Ignore missing values when calculating weights. | ||
|
||
- When `ignore_nulls=False` (default), weights are based on absolute | ||
positions. | ||
For example, the weights of $x_0$ and $x_2$ used in | ||
calculating the final weighted average of $[x_0, None, x_2]$ are | ||
$(1-\alpha)^2$ and $1$ if `adjust=True`, and | ||
$(1-\alpha)^2$ and $\alpha$ if `adjust=False`. | ||
|
||
- When `ignore_nulls=True`, weights are based | ||
on relative positions. For example, the weights of | ||
$x_0$ and $x_2$ used in calculating the final weighted | ||
average of $[x_0, None, x_2]$ are | ||
$1-\alpha$ and $1$ if `adjust=True`, | ||
and $1-\alpha$ and $\alpha$ if `adjust=False`. | ||
|
||
Returns: | ||
Series | ||
|
||
Examples: | ||
>>> import pandas as pd | ||
>>> import polars as pl | ||
>>> import narwhals as nw | ||
>>> data = [1, 2, 3] | ||
>>> s_pd = pd.Series(name="a", data=data) | ||
>>> s_pl = pl.Series(name="a", values=data) | ||
|
||
We define a library agnostic function: | ||
|
||
>>> @nw.narwhalify | ||
... def func(s): | ||
... return s.ewm_mean(com=1, ignore_nulls=False) | ||
|
||
We can then pass either pandas or Polars to `func`: | ||
|
||
>>> func(s_pd) | ||
0 1.000000 | ||
1 1.666667 | ||
2 2.428571 | ||
Name: a, dtype: float64 | ||
|
||
>>> func(s_pl) # doctest: +NORMALIZE_WHITESPACE | ||
shape: (3,) | ||
Series: 'a' [f64] | ||
[ | ||
1.0 | ||
1.666667 | ||
2.428571 | ||
] | ||
|
||
pandas and Polars handle nulls differently. So, calculating ewm over | ||
a sequence with null values leads to distinct results: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's that Polars preserves null values, whereas pandas forward-fills Can we preserve null values for pandas too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The parameters for both polars and pandas are the same, I don't see how to do what you are asking, sorry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think you need to fill the nulls, but just preserve them - so, if a value was null to start with, it should be null in the result too we do something like that in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doing like you do in Pandas:
Polars:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, cause Polars treat 'nan' differently from null - but if we use the null value for both, does the result match? e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for Polars 'nan' is only the result of illegal mathematical operations (like 0/0) so it's far rarer to encounter it there regarding older versions ci - i'd suggest making a separate virtual environment and installing the versions which show up in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Older versions of polars give similar results to pandas when there is a null ( So with an input of:
For the moment I'm "xfailing" that test with older versions. I'm not sure if that's correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks - is it possible to use either that, or raise There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review.. I added the "raise" for now. I'll do the follow up PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks! |
||
|
||
>>> data = [2.0, 4.0, None, 3.0, float("nan"), 3.0] | ||
>>> s_pd2 = pd.Series(name="a", data=data) | ||
>>> s_pl2 = pl.Series(name="a", values=data) | ||
|
||
>>> func(s_pd2) | ||
0 2.000000 | ||
1 3.333333 | ||
2 3.333333 | ||
3 3.090909 | ||
4 3.090909 | ||
5 3.023256 | ||
Name: a, dtype: float64 | ||
|
||
>>> func(s_pl2) # doctest: +NORMALIZE_WHITESPACE | ||
shape: (6,) | ||
Series: 'a' [f64] | ||
[ | ||
2.0 | ||
3.333333 | ||
null | ||
3.090909 | ||
NaN | ||
NaN | ||
] | ||
""" | ||
return self._from_compliant_series( | ||
self._compliant_series.ewm_mean( | ||
com=com, | ||
span=span, | ||
half_life=half_life, | ||
alpha=alpha, | ||
adjust=adjust, | ||
min_periods=min_periods, | ||
ignore_nulls=ignore_nulls, | ||
) | ||
) | ||
|
||
def cast(self: Self, dtype: DType | type[DType]) -> Self: | ||
""" | ||
Cast between data types. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a lot of parameters here, do we have a test which hits each of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add tests for the parameters then...
(I can't hit all the parameters in only one test, at least not the first 4 I think)..Is that what you meant?