-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(fix): extension array indexers #9671
base: main
Are you sure you want to change the base?
Conversation
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…ocessing, raise now early
…t resolution, fix code and tests to allow this
for more information, see https://pre-commit.ci
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
Great @kmuehlbauer - I want the maintainers to look at the MyPy. I could in theory fix it, but I would basically be guessing at what their wishes are for the classes' return types. |
xarray/core/indexing.py
Outdated
) -> np.ndarray: | ||
if dtype is None: | ||
dtype = self.dtype | ||
if pd.api.types.is_extension_array_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed? Why would someone call np.array
with an extension dtype, and then expect it to get translated to a numpy dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for internal usage, otherwise I wouldn't have added it. I can delete the line and then see what happens, and then comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcherian This class is basically an internal adapter so anything that asks for its data in numpy form will call this. Things like repr
, subtraction, and calling .values
on an xarray
object are a few examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand that. Why doesn't the last line (return super().__array__(dtype, copy=copy)
) "just" handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hh, good point, yes, it's unnecessary. Fixed.
) -> np.ndarray: | ||
if dtype is None: | ||
dtype = self.dtype | ||
if pd.api.types.is_extension_array_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. Why is this needed?
@@ -6875,7 +6875,7 @@ def groupby( | |||
[[nan, nan, nan], | |||
[ 3., 4., 5.]]]) | |||
Coordinates: | |||
* x_bins (x_bins) object 16B (5, 15] (15, 25] | |||
* x_bins (x_bins) interval[int64, right] 16B (5, 15] (15, 25] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is amazing, it enables IntervalIndex indexing now.
cc @benbovy
@Illviljan or @headtr1ck can you take a look at the typing failure please |
xarray/namedarray/core.py
Outdated
if pd.api.types.is_extension_array_dtype(data_old.dtype): | ||
# One of PandasExtensionArray or PandasIndexingAdapter? | ||
ndata = data_old.array.to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.api.types.is_extension_array_dtype(data_old.dtype)
does not imply data_old
is an extension array.
You probably should use some kind of isinstance
-check to be able to use .array
.
I haven't used extension arrays myself that much, why can't a simple np.asarray(data_old)
be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems I was able to resolve both comments with no impact. I think pd.api.types.is_extension_array_dtype
is correct as from the docs: https://pandas.pydata.org/docs/reference/api/pandas.api.types.is_extension_array_dtype.html:
This checks whether an object implements the pandas extension array interface. In pandas, this includes:
As for the to_numpy
I think I just forgot that the incoming object has a __array__
implementation
if pd.api.types.is_extension_array_dtype(self._data) and isinstance( | ||
self._data, PandasIndexingAdapter | ||
): | ||
return self._data.array | ||
return self._data.get_duck_array() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So get_duck_array
should handle this by possibly converting to a numpy array. Is that not desired? It is ambiguous, to me at least, if we consider ExtensionArrays a "duck array"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, at the moment get_duck_array
on the PandasIndexingAdapter
returns a numpy array, and the docstring states:
The Variable's data as an array. The underlying array type
(e.g. dask, sparse, pint) is preserved.
which means we probably want to conserve the pandas array typing. But, maybe the implementation of get_duck_array
should be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've just added a few ignores. I think this issue is a bit separate since it doesn't actually affect the runtime behavior
A surprise to me is that Extension arrays doesn't implement |
Right, nor should it for the number of extension array types there are (what's the mean of a categorical? of an interval?)
We have a wrapper around extension arrays that does implement |
I think part of the reason this is leaking is that xarray/xarray/core/variable.py Lines 336 to 366 in d57f05c
I do think this is basically a non-issue since xarray/xarray/namedarray/_typing.py Lines 188 to 192 in d57f05c
IMO we should punt on the typing issue beause it existed before this PR as far as I can tell |
I'm ok with leaving the ignore, not please open a new issue about it so we can keep track. |
Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....
whats-new.rst
api.rst