Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117657: TSAN Fix races in PyMember_Get and PyMember_Set, for C extensions #123211

Merged
merged 32 commits into from
Dec 3, 2024

Conversation

dpdani
Copy link
Contributor

@dpdani dpdani commented Aug 21, 2024

Fix data races that would only be visible when using C extensions.

This is a follow-up on #119368.

I'm intentionally not testing:

  • _Py_T_NONE (deprecated)
  • _Py_T_OBJECT (deprecated)
  • Py_T_STRING (immutable)
  • Py_T_STRING_INPLACE (immutable)

Py_T_CHAR

For some reason Py_T_CHAR is untested also in test_capi.test_structmembers.
In fact, it's not even in the supporting C types, which I'm using for the TSAN tests as well: old api, new api.
I'm wondering if there's a specific reason for this, or if it should be in the test suite instead.

I'm guessing that the TSAN suite should cover the thread-safety of Py_T_CHAR regardless?
Or should we dismiss it for TSAN tests as well?

I've added a new member to the T_CHAR member to the _testcapi module, and the rest of the tests don't seem to break.
I can revert this change if needed.

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite hard to review since everything looks the same and the functions are not ordered by pairs apparently. I didn't go through everything but I had a question and an observation.

Include/cpython/pyatomic.h Show resolved Hide resolved
Include/internal/pycore_pyatomic_ft_wrappers.h Outdated Show resolved Hide resolved
@dpdani
Copy link
Contributor Author

dpdani commented Aug 22, 2024

I'm not exactly sure why test_importlib fails, but I don't want to push a dummy commit just to retry it.

Anyways, the TSAN checks for test_free_threading.test_slots are 🟢

@picnixz
Copy link
Contributor

picnixz commented Aug 22, 2024

I'm not exactly sure why test_importlib fails, but I don't want to push a dummy commit just to retry it.

I've relaunched the test manually (I can't relaunch the SSL test though, I don't know why)

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left a few comments below.

How long do the added tests takes to run?

Python/structmember.c Outdated Show resolved Hide resolved
Python/structmember.c Outdated Show resolved Hide resolved
Python/structmember.c Outdated Show resolved Hide resolved
@dpdani
Copy link
Contributor Author

dpdani commented Sep 16, 2024

@colesbury can you take another look?

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good. Please add a char test to test_structmembers.py. I think the implementation in the PR is broken (see inline comment) and it's not caught by any unit test.

It's fine to ping me on PRs like you did here, but please also use GitHub's "re-request review" button (top-right corner, in the reviewers section). It means that the PR shows up in my list of "awaiting reviews".

Python/structmember.c Outdated Show resolved Hide resolved
Python/structmember.c Outdated Show resolved Hide resolved
Python/structmember.c Outdated Show resolved Hide resolved
Python/structmember.c Outdated Show resolved Hide resolved
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
@dpdani
Copy link
Contributor Author

dpdani commented Sep 16, 2024

Sorry, will do.

@dpdani
Copy link
Contributor Author

dpdani commented Nov 2, 2024

I finally had some time to come back to this.

I guess that moving the stores after error checking does slightly change the externally-visible behavior.
Should we write a news entry?

@dpdani dpdani requested a review from colesbury November 2, 2024 17:54
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colesbury
Copy link
Contributor

!buildbot nogil

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 6c5cec5 🤖

The command will test the builders whose names match following regular expression: nogil

The builders matched are:

  • AMD64 Windows Server 2022 NoGIL PR
  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 CentOS9 NoGIL Refleaks PR
  • x86-64 MacOS Intel ASAN NoGIL PR
  • AMD64 Windows PGO NoGIL PR
  • aarch64 Fedora Rawhide NoGIL PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • ARM64 MacOS M1 NoGIL PR
  • AMD64 CentOS9 NoGIL PR
  • ARM64 MacOS M1 Refleaks NoGIL PR
  • AMD64 Fedora Rawhide NoGIL PR
  • PPC64LE Fedora Rawhide NoGIL PR
  • x86-64 MacOS Intel NoGIL PR

@colesbury colesbury merged commit 979bf24 into python:main Dec 3, 2024
57 of 61 checks passed
@colesbury
Copy link
Contributor

Thanks @dpdani!

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL9 3.x has failed when building commit 979bf24.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1565/builds/859) and take a look at the build logs.
  4. Check if the failure is related to this commit (979bf24) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1565/builds/859

Failed tests:

  • test_capi

Failed subtests:

  • test_char - test.test_capi.test_structmembers.ReadWriteTests_NewAPI.test_char
  • test_char - test.test_capi.test_structmembers.ReadWriteTests_OldAPI.test_char

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel9-s390x/build/Lib/test/test_capi/test_structmembers.py", line 170, in test_char
    self.assertEqual(ts.T_CHAR, "c")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
AssertionError: '\x00' != 'c'
- �
+ c

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL9 LTO + PGO 3.x has failed when building commit 979bf24.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1578/builds/862) and take a look at the build logs.
  4. Check if the failure is related to this commit (979bf24) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1578/builds/862

Failed tests:

  • test_capi

Failed subtests:

  • test_char - test.test_capi.test_structmembers.ReadWriteTests_NewAPI.test_char
  • test_char - test.test_capi.test_structmembers.ReadWriteTests_OldAPI.test_char

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel9-s390x.lto-pgo/build/Lib/test/test_capi/test_structmembers.py", line 170, in test_char
    self.assertEqual(ts.T_CHAR, "c")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
AssertionError: '\x00' != 'c'
- �
+ c

@dpdani
Copy link
Contributor Author

dpdani commented Dec 3, 2024

that sounds like my fault, I'll take a look later today

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL9 LTO 3.x has failed when building commit 979bf24.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1587/builds/864) and take a look at the build logs.
  4. Check if the failure is related to this commit (979bf24) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1587/builds/864

Failed tests:

  • test_capi

Failed subtests:

  • test_char - test.test_capi.test_structmembers.ReadWriteTests_NewAPI.test_char
  • test_char - test.test_capi.test_structmembers.ReadWriteTests_OldAPI.test_char

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel9-s390x.lto/build/Lib/test/test_capi/test_structmembers.py", line 170, in test_char
    self.assertEqual(ts.T_CHAR, "c")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
AssertionError: '\x00' != 'c'
- �
+ c

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL8 3.x has failed when building commit 979bf24.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/509/builds/7885) and take a look at the build logs.
  4. Check if the failure is related to this commit (979bf24) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/509/builds/7885

Failed tests:

  • test_capi

Failed subtests:

  • test_char - test.test_capi.test_structmembers.ReadWriteTests_NewAPI.test_char
  • test_char - test.test_capi.test_structmembers.ReadWriteTests_OldAPI.test_char

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/test_capi/test_structmembers.py", line 170, in test_char
    self.assertEqual(ts.T_CHAR, "c")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
AssertionError: '\x00' != 'c'
- �
+ c

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL8 LTO + PGO 3.x has failed when building commit 979bf24.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/442/builds/7968) and take a look at the build logs.
  4. Check if the failure is related to this commit (979bf24) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/442/builds/7968

Failed tests:

  • test_capi

Failed subtests:

  • test_char - test.test_capi.test_structmembers.ReadWriteTests_NewAPI.test_char
  • test_char - test.test_capi.test_structmembers.ReadWriteTests_OldAPI.test_char

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x.lto-pgo/build/Lib/test/test_capi/test_structmembers.py", line 170, in test_char
    self.assertEqual(ts.T_CHAR, "c")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
AssertionError: '\x00' != 'c'
- �
+ c

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL8 LTO 3.x has failed when building commit 979bf24.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/567/builds/7887) and take a look at the build logs.
  4. Check if the failure is related to this commit (979bf24) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/567/builds/7887

Failed tests:

  • test_capi

Failed subtests:

  • test_char - test.test_capi.test_structmembers.ReadWriteTests_NewAPI.test_char
  • test_char - test.test_capi.test_structmembers.ReadWriteTests_OldAPI.test_char

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x.lto/build/Lib/test/test_capi/test_structmembers.py", line 170, in test_char
    self.assertEqual(ts.T_CHAR, "c")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
AssertionError: '\x00' != 'c'
- �
+ c

@vstinner
Copy link
Member

vstinner commented Dec 3, 2024

@dpdani @colesbury: As you can see in previous comments, test_capi fails on multiple buildbots. I'm not sure why.

@dpdani
Copy link
Contributor Author

dpdani commented Dec 3, 2024

mm, I can't seem to reproduce it locally. maybe some platform compatibility issue?
bedevere reported here failures only for s390x, but I don't have easy access to that.

what do you usually do in these situations?

@colesbury
Copy link
Contributor

Let's skip the test on s390x for now while we investigate.

I'm not sure what's causing the failure. The only thing I can think of is that char is unsigned on s390x (vs. signed on GCC x86-64 and arm64).

@colesbury
Copy link
Contributor

#127577 should fix it. The problem was that the 'C' format code expects an int as the destination (not a char):

https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue

This happens to work out okay on little endian systems because the non-zero part of the value is at the same place (the first byte), and it just writes some zeros to the struct's padding bytes. On big endian systems, like s390x, it would write zero to char_member and some non-zero value to parts of the padding bytes.

I followed the instructions on https://docs.gitlab.com/omnibus/development/s390x.html to debug s390x using Docker and QEMU. I had to slightly modify their commands: in particular, I had to specify --platform linux/s390x:

docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker run --platform linux/s390x -v $(pwd):/cpython --rm -it s390x/ubuntu

@dpdani
Copy link
Contributor Author

dpdani commented Dec 4, 2024

oops, my bad

thank you for all the details 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants