Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add polars support #305

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

Conversation

elephaint
Copy link
Contributor

@elephaint elephaint commented Oct 31, 2024

Adds support for Polars via Narwhals.

The extension to all frameworks supported under Narwhals should hereafter be (nearly) trivial.

PR isn't breaking (i.e. fully supports existing NIXTLA_ID_AS_COL behaviour).

Open items

  • Convert core.ipynb
  • Convert evaluation.ipynb
  • Convert methods.ipynb
  • Convert probabilistic_methods.ipynb
  • Convert utils.ipynb
  • Add tutorial on using with Polars
  • Re-run existing examples in the docs to check correct workings & adapt for deprecation of unique_id as index
  • PR shouldn't be breaking

Fixes / better solutions required

  • Fix performance regression in aggregate function
  • Fix .agg function not allowing multiple aggregations of the same column (needed for exogenous)
  • Fix how to deal with unique_id being index everywhere but not for Polars frames
  • Fix plotting in utils - it now just converts Polars to Pandas when plotting, that isn't nice
  • Fix evaluation not aggregating correctly

Tests

  • Add equivalence tests to unit tests for Polars
  • Inspect performance regression from adding Polars support

Performance evaluation

Reconcilers seem to run somewhat faster after this PR. Reconciler0 = MinTShrink, Reconciler1 = BottomUp

This PR:

Screenshot 2024-11-12 232320

Current main:

Screenshot 2024-11-12 232328

Test script for PR in tests\test_benchmark.py, run with pytest tests\test_benchmark.py::test_reconciler -v -s --benchmark-min-rounds=20 --disable-warnings

@elephaint elephaint linked an issue Oct 31, 2024 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MarcoGorelli
Copy link

Hey @elephaint , thanks for trying out Narwhals! We can prioritise pivot as that would be helpful to you

We'd previously run into issues with Polars pre1.0 inconsistencies, but from this PR it looks you only need the 1.0+ behaviour, right?

Is there anything else we could do in Narwhals to help you out here?

@elephaint
Copy link
Contributor Author

elephaint commented Nov 12, 2024

Hey @elephaint , thanks for trying out Narwhals! We can prioritise pivot as that would be helpful to you

We'd previously run into issues with Polars pre1.0 inconsistencies, but from this PR it looks you only need the 1.0+ behaviour, right?

Is there anything else we could do in Narwhals to help you out here?

Thanks! For this PR / repo I think indeed at the moment the only thing I miss is pivot - which I kind of get as it's an expensive op (as most fast frameworks are column based).

Another minor thing that would be great is ability to ascertain datatypes at the op level without casting afterwards. For example, doing .to_dummies() with Polars will result in an Uint8 but with Pandas in a Float64. I know it can be done by subsequently casting over all the generated columns, but not sure this is the most efficient way when 1000+ columns are generated at once through the .to_dummies() command. But then again, this is minor, in this PR I settled on not using dummies.

A thing that I found a bit unintuitive is that Narwhals seems to keep track of the order of a Pandas Integer Index even after narwhallifying it - I didn't expect that. Again minor thing, because I just need to remember to do .maybe_reset_index() after every op that potentially changes the order.

But awesome library, keep up the good work! For this PR I wanted to play with it for Polars support, maybe later I'll try other frameworks - should almost work out of the box now (save for the pivot op).

Edit: maybe one other thing - I see these warnings:

image

but it doesn't show which codeline it affects, so difficult to debug? I know where I'm adding columns and stuff, but that's in a lot of places.

@MarcoGorelli
Copy link

Thanks so much for your feedback! 🙏

  • pivot: we're revisiting the PR and will try to make a release shortly
  • to_dummies: we've addressed that as part of this PR so that the output dtype is consistently Int8, we'd overlooked the dtype difference the first time round. Unsigned integers can be a bit unexpected when working with pandas, so Int8 seemed like a good compromise. Would this work for you?
  • fragmentation warning: technically the warning is coming from pandas, but it's really useful that you've reported this to us - I'll take a look to see if there's anything we're doing that's causing it (and whether it's an issue)

@elephaint
Copy link
Contributor Author

Thanks so much for your feedback! 🙏

  • pivot: we're revisiting the PR and will try to make a release shortly

Wow, awesome. I'll have a go at it once released.

  • to_dummies: we've addressed that as part of this PR so that the output dtype is consistently Int8, we'd overlooked the dtype difference the first time round. Unsigned integers can be a bit unexpected when working with pandas, so Int8 seemed like a good compromise. Would this work for you?

Yes, perfect! I think in general what I'd expect (in an ideal world 🙈 - I know this is insanely hard to get perfect) from Narwhals is output equivalence up to DataFrame type (i.e. everything is the same, except the 'container' of the data - e.g. dtypes, row order, column order, column names, precision, nulls, nans, infs, categories are all similar).

  • fragmentation warning: technically the warning is coming from pandas, but it's really useful that you've reported this to us - I'll take a look to see if there's anything we're doing that's causing it (and whether it's an issue)

👍

@elephaint elephaint marked this pull request as ready for review November 13, 2024 15:09
@elephaint elephaint mentioned this pull request Nov 13, 2024
@MarcoGorelli
Copy link

thanks! it's released, and the fragmentation warning is hopefully addressed too

action_files/test_models/src/data.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
@elephaint
Copy link
Contributor Author

thanks! it's released, and the fragmentation warning is hopefully addressed too

Awesome, thanks, I'll have a look at it

@elephaint
Copy link
Contributor Author

thanks! it's released, and the fragmentation warning is hopefully addressed too

Great work, I don't see the fragmentation messages anymore, and pivot works as expected. Thanks! 👍

setup.py Outdated Show resolved Hide resolved
tests/test_benchmark.py Outdated Show resolved Hide resolved
tests/test_benchmark.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
hierarchicalforecast/core.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] Add support for polars
3 participants