Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate / warn about raising an error in __array__ when copy=False cannot be honore #60340

Open
jorisvandenbossche opened this issue Nov 16, 2024 · 10 comments · May be fixed by #60395
Open
Assignees
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@jorisvandenbossche
Copy link
Member

The numpy 2.0 changed the behavior of the copy keyword in __array__, and especially making copy=False to be strict (raising an error when a zero-copy numpy array is not possible).
We only adjusted pandas to update the copy handling now in #60046 (issue #57739).

But that also introduced a breaking change for anyone doing np.array(ser, copy=False) (and who hasn't updated that when updating to numpy 2.0), which historically has always worked fine and could silently give a copy anyway.

The idea would be to still include a FutureWarning about this first before raising the error (as now in main) in pandas 3.0.

See #60046 (comment) for more context

@jorisvandenbossche jorisvandenbossche added the Compat pandas objects compatability with Numpy or Python functions label Nov 16, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Nov 16, 2024
@KevsterAmp
Copy link
Contributor

take

1 similar comment
@KevsterAmp
Copy link
Contributor

take

@KevsterAmp
Copy link
Contributor

KevsterAmp commented Nov 21, 2024

@jorisvandenbossche - I'm having a hard time trying to replicate np.array(ser, copy=False) to return an error using pandas latest release (2.2) or 2.3.x branch with Numpy v2.0 or Numpy>v2.0. I'm trying to replicate it to use it for debugging, Thanks

@jorisvandenbossche
Copy link
Member Author

Try this example with latest main:

In [1]: ser = pd.Series(["a", "b"], dtype="category")

In [2]: np.array(ser, copy=False)
...
ValueError: Unable to avoid copy while creating an array as requested.

You need to use a dtype that cannot be converted zero-copy to numpy, such as category I used above (if you would use integers, for example, that will not error).

And also you need latest main (or 2.3.x), this is not yet included in a released version.

@KevsterAmp
Copy link
Contributor

Can't seem to replicate on my end, on both main and 2.3.x. Running:

import numpy as np
import pandas as pd

print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

ser = pd.Series(["a", "b"], dtype="category")
x = np.array(ser, copy=False)
print(x)

Output:

(pandas-dev) kev@mac pandas % python test.py
+ /Users/kev/.pyenv/versions/3.10.14/bin/ninja
[1/1] Generating write_version_file with a custom command
NumPy version: 1.26.4
Pandas version: 3.0.0.dev0+1580.g68d9dcab5b
['a' 'b']

I'm running on macOS 15.1.1

@jorisvandenbossche
Copy link
Member Author

Ah, you need numpy > 2.0

@KevsterAmp
Copy link
Contributor

Thanks, I'm now able to replicate it on my end. Working on the PR 🔧

@jorisvandenbossche
Copy link
Member Author

Great!

@KevsterAmp
Copy link
Contributor

@jorisvandenbossche - Does this warning message look good to you?

Numpy>=2.0 changed copy keyword's behavior, making copy=False raise an error when a zero-copy numpy array is not possible

@jorisvandenbossche
Copy link
Member Author

I would still add something like "pandas will follow that behaviour starting with pandas 3.0" and "this conversion to numpy requires a copy, but 'copy=False' was passed. This will start raise an error in the future. Use np.asarray(..) instead."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
2 participants