-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST (string dtype): clean-up assorted xfails #60345
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,6 @@ | ||
import numpy as np | ||
import pytest | ||
|
||
from pandas._config import using_string_dtype | ||
|
||
import pandas as pd | ||
from pandas import ( | ||
CategoricalIndex, | ||
|
@@ -754,13 +752,12 @@ def test_intersection_keep_ea_dtypes(val, any_numeric_ea_dtype): | |
tm.assert_index_equal(result, expected) | ||
|
||
|
||
@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") | ||
def test_union_with_na_when_constructing_dataframe(): | ||
# GH43222 | ||
series1 = Series( | ||
(1,), | ||
index=MultiIndex.from_arrays( | ||
[Series([None], dtype="string"), Series([None], dtype="string")] | ||
[Series([None], dtype="str"), Series([None], dtype="str")] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this does fix the test. Is the behavior if this change to the test is not made correct? i.e. the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps related to #60338? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
With the above fix (and when infer_string is enabled), the test uses So it's only testing that the NaNs are properly matched when creating the rows from Series objects with a MultiIndex, it does not test having different dtypes in series1 vs series2. That's also something we could test, though (and then if you have object dtype index and str dtype index, one would expect the result to be object dtype index (since that is the "common" dtype), but was not how the test was currently set up.
I suppose not because here we don't actually have empty objects. Both |
||
), | ||
) | ||
series2 = Series((10, 20), index=MultiIndex.from_tuples(((None, None), ("a", "b")))) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,6 @@ | ||
import numpy as np | ||
import pytest | ||
|
||
from pandas._config import using_string_dtype | ||
|
||
from pandas.core.dtypes.concat import union_categoricals | ||
|
||
import pandas as pd | ||
|
@@ -124,12 +122,15 @@ def test_union_categoricals_nan(self): | |
exp = Categorical([np.nan, np.nan, np.nan, np.nan]) | ||
tm.assert_categorical_equal(res, exp) | ||
|
||
@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)", strict=False) | ||
@pytest.mark.parametrize("val", [[], ["1"]]) | ||
def test_union_categoricals_empty(self, val, request, using_infer_string): | ||
# GH 13759 | ||
if using_infer_string and val == ["1"]: | ||
request.applymarker(pytest.mark.xfail("object and strings dont match")) | ||
request.applymarker( | ||
pytest.mark.xfail( | ||
reason="TDOD(infer_string) object and strings dont match" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo here that may mean this gets missed when grepping the TODOs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, good catch! Yes, I do grep for that so good to fix that typo |
||
) | ||
) | ||
res = union_categoricals([Categorical([]), Categorical(val)]) | ||
exp = Categorical(val) | ||
tm.assert_categorical_equal(res, exp) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why if a user specifically requests not to make a copy are we converting the numpy array to a pyarrow arrow baked string array for what is an immutable index? There are other performance benefits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the other way around, the pyarrow array (stored under the hood in
obj
) is being converted to a numpy array, and that can just never be done without a copy (different memory layout)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first line of the test is
obj = pd.Index(arr, copy=False)
so if we have a numpy
arr
and specifycopy=False
for the immutable index we get a pyarrow backed index and a copy is made? and then the.to_numpy()
method makes another copy?so the question is perhaps should
idx = pd.Index(arr, copy=False)
return anIndex
with a string dtype, perhaps raise like numpy now do for__array__
when a copy can't be made or is this a moot point when CoW is extended to the Indexes as the copy keyword would be irrelevant?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I assumed you were commenting on what is being tested here, i.e. the
obj.to_numpy(copy=False)
copying or not.For
pd.Index(arr, copy=False)
: in general ourcopy
keywords in constructors are not strict, but only mean to avoid a copy at "best effort" (e.g. also if you pass a python list, it will make a copy regardless of that keyword). If we would want to change that more generally, that's a bigger topic to discuss.It's certainly not a moot point, because with CoW we actually copy more often on input with non-pandas objects. Although it seems we didn't make that change for
Index(..)
, where the default is stillcopy=False