Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy 2.0 compatibility #1632

Merged

Conversation

agriyakhetarpal
Copy link
Contributor

@agriyakhetarpal agriyakhetarpal commented Nov 15, 2024

Description

This PR is an attempt to compile with NumPy v2 at build time plus relax the Cython<3 requirement, so that river wheels will be compatible with both NumPy v2 and NumPy v1.X. Please refer to #1630 for more context around putting up this pull request.

Additional context

xref: pyodide/pyodide#4925

@agriyakhetarpal
Copy link
Contributor Author

The CI logs show that it still installs NumPy 1.26.4. I am not acquainted with using Poetry, especially not for compiled projects. How should I lift/make Poetry lift this bound (and for Cython)? pip install -e . locally does use NumPy v2 and Cython 3 to compile.

@gbolmier
Copy link
Member

Thank you so much for all your efforts @agriyakhetarpal!

The CI logs show that it still installs NumPy 1.26.4. I am not acquainted with using Poetry, especially not for compiled projects. How should I lift/make Poetry lift this bound (and for Cython)? pip install -e . locally does use NumPy v2 and Cython 3 to compile.

Try incrementing this line

@agriyakhetarpal
Copy link
Contributor Author

I don't think that helped, it's picking up 1.26 from somewhere... what are we missing?

@e10e3
Copy link
Contributor

e10e3 commented Nov 15, 2024

If you look at the updated lockfile, NumPy's version here is still at 1.26. This is what Poetry installs. For Poetry to select a higher version, you need to update Scikit-learn, because the version currently selected forbids NumPy 2. 😃

I believe this CI does not build wheels (if this is what you are trying to do), it only installs River and runs the tests.
You can build a wheel locally with pip wheel .. (I use Pip here because Poetry says it only supports pure Python wheels.)

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Nov 15, 2024

If you look at the updated lockfile, NumPy's version here is still at 1.26. This is what Poetry installs. For Poetry to select a higher version, you need to update Scikit-learn, because the version currently selected forbids NumPy 2. 😃

Ah, thank you! That must be it – I'll bump the rest of the dependencies with this, update the lockfile, and check when the requirement gets relaxed fully.

I believe this CI does not build wheels (if this is what you are trying to do), it only installs River and runs the tests.
You can build a wheel locally with pip wheel .. (I use Pip here because Poetry says it only supports pure Python wheels.)

Building wheels or testing an editable/non-editable installation should be the same thing, since in any case the build backend is going to compile a wheel in a temporary location, install from it, and then discard it. I'll check if I need to do anything extra when I try building wheels. The local wheel build with both frontends: pip wheel and pypa/build succeeded (which used NumPy 2 because I added that constraint for the build-time requirement – or so I think unless poetry's backend did something weird :P)

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Nov 15, 2024

Okay, we're at numpy 2.0.2 now, and here's where I'll start to expect seeing tests that fail and the builds passing. I'll check this later in the day.

@e10e3
Copy link
Contributor

e10e3 commented Nov 15, 2024

Building wheels or testing an editable/non-editable installation should be the same thing, since in any case the build backend is going to compile a wheel in a temporary location, install from it, and then discard it.

Ooh, I wasn't aware of how it works. Thank you 😄

@MaxHalford
Copy link
Member

Thanks everyone for working on this.

For the failing tests: the philosophy behind River is to hide the fact that numpy is sometimes used under the hood. River objects are supposed to take as input and output Python objects. So we have to make some changes so that Python floats are returned, and not NumPy floats. I suppose this isn't very difficult to do, it's just a matter to of pin-pointing where the NumPy originates from.

@MaxHalford
Copy link
Member

This PR fixes most of the unit tests.

@smastelini the remaining broken unit tests are related to the drift module. Would you have some time to take a look?

@smastelini
Copy link
Member

Sure! I will take a look at it in the next couple days. Today I'm going to a conference and will spent the day on the road.

@MaxHalford
Copy link
Member

Nice! Enjoy your conference :)

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Nov 19, 2024

It looks like things are working without an issue, but from the logs, I still see Installing numpy (2.0.2) – I'd like to try if it's possible to bump it further to NumPy >= 2.1. It's coming from thinc, and in-turn, from spacy.

Another thing I noticed is that polars's versions 1.13.0 and 1.13.1 were yanked over the weekend, so I'll re-generate the lockfile to use 1.14.

@agriyakhetarpal agriyakhetarpal force-pushed the feat/numpy-2.0-compatibility branch from f880212 to f4f7a7d Compare November 19, 2024 03:20
Copy link
Contributor Author

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping to NumPy 2.1 is not possible right now because it constrains SciPy (which, as of 1.14.1, requires Python 3.10 and later, and river still supports Python 3.9) and because spaCy does not yet have a stable v4 release for its pre-releases on PyPI that relaxes NumPy to be <3. However, building against NumPy 2.0 is reasonable enough and should be fine here. Thanks for all the help! I'll mark this as ready for review now, with some additional comments to help facilitate the review.

.github/actions/install-env/action.yml Outdated Show resolved Hide resolved
.github/workflows/code-quality.yml Show resolved Hide resolved
build.py Show resolved Hide resolved
build.py Show resolved Hide resolved
build.py Show resolved Hide resolved
build.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
river/optim/newton.py Show resolved Hide resolved
agriyakhetarpal

This comment was marked as duplicate.

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review November 19, 2024 04:04
@MaxHalford
Copy link
Member

Thank you so much @agriyakhetarpal, and kudos on the self-review. I guess we can merge!

@MaxHalford MaxHalford merged commit e069b67 into online-ml:main Nov 19, 2024
4 checks passed
@agriyakhetarpal agriyakhetarpal deleted the feat/numpy-2.0-compatibility branch November 19, 2024 06:22
@agriyakhetarpal
Copy link
Contributor Author

Thanks for the merge! Are there any additional items I can help with to address before a release with this PR is made?

@MaxHalford
Copy link
Member

I guess @gbolmier (or you) can take a look at whether adding support for Python 3.13 is a low hanging fruit or not. If it is, we may as well include it in the next release too.

@agriyakhetarpal
Copy link
Contributor Author

It doesn't look like it is easy to do, but I'll explore the package in more detail sometime and try to contribute here in other ways if I can, if I have time 😄

@MaxHalford
Copy link
Member

Actually I think it's acceptable to only support NumPy v2 if that makes things easier!

@agriyakhetarpal
Copy link
Contributor Author

Oh, do you mean supporting NumPy >=2 for both build time and runtime? That could work for River as an application, but River is being used as a library downstream in quite a few projects (https://www.wheelodex.org/projects/river/rdepends/) where I would think it's too early for an upper bound that's so high, isn't it?

@MaxHalford
Copy link
Member

Well I would say these libraries can always pin River to a previous version if they want it to keep working. But I guess you're right.

@gbolmier
Copy link
Member

Shout-out to @agriyakhetarpal for the great work he put in 🙌

I guess @gbolmier (or you) can take a look at whether adding support for Python 3.13 is a low hanging fruit or not. If it is, we may as well include it in the next release too.

I should have time to give it a try in the next days. I'll ask help from @AdilZouitine if there is non-trivial rust debugging to do.

Bumping to NumPy 2.1 is not possible right now because it constrains SciPy (which, as of 1.14.1, requires Python 3.10 and later, and river still supports Python 3.9)

I think it would make sense to not support 3.9 anymore like NumPy and SciPy do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants