Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array #59756

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Sep 9, 2024

Further follow-up on #54654, #54687, #57212, because we still had the issue when copy=False was passed (which happens when casting to any of the string dtypes).

The added tests failed without the actual code change.

(smaller change extracted from #59685)

xref #54792

@jorisvandenbossche jorisvandenbossche added Bug Strings String extension data type and string data labels Sep 9, 2024
result = result.copy()
elif not copy and result is arr:
already_copied = False
if result is arr or np.may_share_memory(arr, result):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason for may_share_memory instead of shares_memory?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that is what was recommended by the numpy developer: #54654 (comment)

My understanding is that we have here either always a full copy or either exactly the same memory, so just checking the bounds should be sufficient, and there is less risk to this being a costly check (although I don't really know how this works under the hood, so not sure if there is any risk in practice)

base_copy.astype(any_string_dtype)
tm.assert_series_equal(base, base_copy)


@td.skip_if_no("pyarrow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this decorator now redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, thanks

@@ -120,14 +120,23 @@ def test_astype_string_copy_on_pickle_roundrip():
tm.assert_series_equal(base, base_copy)


def test_astype_string_copy_on_pickle_roundrip(any_string_dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i cant find it now, but i thought one of the various string-dtype-like fixtures included str, which would let us de-duplicate this test with the previous one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string_dtype fixture has that, but that fixture doesn't have all the other variants. After 3.0, the astype(str) will be equivalent to astype(pd.StringDtype(na_value=np.nan)) and then the above test will be superfluous, but at the moment astype(str) still does something specfic (do convert to string but end up in object dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment

@jbrockmendel
Copy link
Member

small questions, otherwise LGTM

@mroeschke mroeschke added this to the 2.3 milestone Sep 9, 2024
@mroeschke mroeschke merged commit b717abb into pandas-dev:main Sep 9, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

tigerhawkvok added a commit to tigerhawkvok/pandas that referenced this pull request Sep 9, 2024
…ring_array (pandas-dev#59756) (#2)

* BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array

* update

Co-authored-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche jorisvandenbossche deleted the string-dtype-ensure_string_array branch September 9, 2024 20:35
ammar-qazi pushed a commit to ammar-qazi/pandas that referenced this pull request Sep 9, 2024
…ring_array (pandas-dev#59756)

* BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array

* update
jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Oct 10, 2024
…ring_array (pandas-dev#59756)

* BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array

* update
jorisvandenbossche added a commit that referenced this pull request Oct 10, 2024
…ring_array (#59756)

* BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants