Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PYBIND11_SIMPLE_GIL_MANAGEMENT option (cmake, C++ define) #4216

Merged
merged 34 commits into from
Oct 30, 2022

Conversation

Chekov2k
Copy link
Contributor

@Chekov2k Chekov2k commented Oct 5, 2022

Description

This PR adds a PYBIND11_SIMPLE_GIL_MANAGEMENT option (cmake, C++ define). The new option may be useful to try when debugging GIL-related issues, to determine if the more complex default implementation is or is not to blame. See comments here for background.

This PR was triggered by pytorch/pytorch#83101. While we could not reproduce the original problem in a unit test, many tests were added to test_gil_scoped.py trying to do so. That work exposed that test_gil_scoped.py has been finding deadlocks for a long time (years), but those were thus far ignored as "flakes". test_gil_scoped.py was updated to make it crystal clear when a DEADLOCK is detected. It turns out deadlocks are detected at a fairly high rate (ballpark 1 in 1000). This needs to be worked on separately. To not continue to pollute the CI results in the meantime, SKIP_IF_DEADLOCK = True was added in test_gil_scoped.py. The GitHub Actions CI logs can be harvested in the future to determine what platforms exactly are affected, and at what rate. Hopefully the information accumulating over time will lead to a better understanding and fixes, so that SKIP_IF_DEADLOCK can be removed eventually.

This PR also adds ThreadSanitizer compatibility to test_gil_scoped.py (closes #2754).

WARNING: Please be careful to not create ODR violations when using the new option: everything that is linked together with mutual symbol visibility needs to be rebuilt.

Suggested changelog entry:

* A `PYBIND11_SIMPLE_GIL_MANAGEMENT` option was added (cmake, C++ define), along with many additional tests in test_gil_scoped.py. The option may be useful to try when debugging GIL-related issues, to determine if the more complex default implementation is or is not to blame. See #4216 for background. WARNING: Please be careful to not create ODR violations when using the option: everything that is linked together with mutual symbol visibility needs to be rebuilt.

@Chekov2k Chekov2k requested a review from henryiii as a code owner October 5, 2022 08:52
@Skylion007
Copy link
Collaborator

@rwgk You should take a look given your interest in the GIL recently.

@rwgk
Copy link
Collaborator

rwgk commented Oct 5, 2022

@rwgk You should take a look given your interest in the GIL recently.

I don't feel like an expert here at all, but @jbms is!

IIUC a-while-ago-comments from Jeremy, for recent Python versions, the complicated version of gil_scoped_acquire doesn't actually do anything. I don't remember if Jeremey told me on chat, or if that's buried somewhere in the extremely long PR #1895 comment history.

I think adding a define is a great idea, but the naming seems unfortunate IIUC: the "simple" (as I look at it) version of gil_scoped_acquire does not seem to depend on PyPy at all. It's just there because PyPy cannot deal with the complicated version. Is that correct?

How about PYBIND11_USE_SIMPLE_GIL_SCOPED_ACQUIRE for the new define?

Doing something to ci.yml to use the define seems important, pulling in something like the code in #1276 as a new test.

@jbms
Copy link
Contributor

jbms commented Oct 5, 2022

I was not aware of #1276 but I just took a look at it.

In regards to my comment about gil_scoped_acquire that @rwgk referenced, the pybind11 source code mentions 3 differences/features provided:

Feature 1:

* 1. When acquiring the GIL from an non-main thread during the finalization

As of Python 3.8, that is no longer true (https://docs.python.org/3/c-api/init.html#c.PyEval_AcquireThread). The calling thread still blindly "terminates", where "terminate" on non-Windows platforms might actually mean "crash the program" (python/cpython#28525). Therefore we can ignore this difference.

Feature 2:

* 2. The gil_scoped_release function can optionally cut the relationship

This comment briefly explains the dissoc argument to gil_scoped_release, but I am not sure that I fully understand this use case, and I don't see any tests of that functionality in pybind11 itself. Without a concrete example it is hard to judge whether it is safe, but given that PyGILState_Ensure uses its own TSS key to associate a Python thread state with an operating system thread, it seems possible that it may be unsafe. The comment references the nanoGUI project but I didn't take the time to analyze nanoGUI's use.

Feature 3:

* 3. The reference count of an acquired thread state can be controlled. This

I can see that repeatedly constructing an destroying the thread state may be expensive (and may be an issue in tensorstore and other code at Google). It would be interesting to get some performance numbers on this. However, it seems like it may be possible to control the reference count in a similar way just by using CPython APIs, since you can mix calls to PyGILState_Ensure and PyGILState_Release (which increment and decrement a reference count in Python, similar to the implementation in pybind11) and PyEval_SaveThread/PyEval_RestoreThread which do not increment/decrement the reference count.

In any case, per the discussion in issue #1276, it sounds like the gil_scoped_acquire in pybind11 was intended to be compatible with PyGILState_Ensure / PyGILState_Release, but unfortunately it is not. It sounded like originally the incompatibility only applied to debug builds of Python, but it seems like it actually can occur with all builds of Python? I have not actually investigated this issue myself, though.

It seems like we should definitely have this incompatibility fixed in pybind11., at least with the default build options. Given that feature 2 is an extremely niche use case (perhaps it is used exclusively by nanoGUI), we certainly do not want to sacrifice safety of pybind11 by default for it. Given that whatever GIL mechanism is used also needs to be compatible with other code that just uses the regular CPython GIL mechanism, it is not clear to me why this feature, if it is possible to implement safely, even needs to be in pybind11 at all.

My recommendation is that instead of adding this option, we just eliminate pybind11's custom gil logic and use PyGILState_Ensure / PyGILState_Release all the time. We can leave internals.tstate present (depending on ABI version), to avoid breaking ABI compatibility. This will break source compatibility for extensions that rely on "feature 2" or "feature 3", but I expect there aren't very many users. They would have to migrate to an alternative implementation, or continue building with an older version of pybind11. For feature 3 I think we could suggest an alternative implementation as an example.

@rwgk
Copy link
Collaborator

rwgk commented Oct 6, 2022

@wjakob for comment

Wenzel is the original author of the code that now lives in gil.h:

Wenzel: All these years later, do we still need these features? What would be lost if we removed it and only used the simpler version?

@Chekov2k
Copy link
Contributor Author

Chekov2k commented Oct 6, 2022

Thank you for taking this so seriously! I just wanted to mention that finding this solution was a team effort with two of my colleagues from [SeeChange], Tigran and @tjrin. I'll hold fire with editing the pull request just yet, until it gets clearer if it will be needed or not.

@rwgk
Copy link
Collaborator

rwgk commented Oct 6, 2022

Prompted by the discussion here, I went ahead and stripped down gil.h to the simple version only (PR #4218), then ran Google-global testing, which passed.

My feeling: The extra complexity is known to cause trouble (pytorch/pytorch#83101), but it is unclear what it gives us on the positive side (@wjakob?).

I'd lean towards either:

  • Making the complicated version an option, and the simple version the default.
  • Or deleting the complicated version completely, unless we have a good explanation (ideally with measurements) why it is still needed.

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Oct 6, 2022
@henryiii
Copy link
Collaborator

henryiii commented Oct 20, 2022

How about we name this PYBIND11_USE_SIMPLE_GIL, and in a later version change the default, and maybe eventually remove it if the simple version is fine. Edit: Seeing #4218, maybe this could be an option in 2.10.1, and the only version in 2.11?

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
include/pybind11/gil.h Outdated Show resolved Hide resolved
include/pybind11/gil.h Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@rwgk
Copy link
Collaborator

rwgk commented Oct 21, 2022

Naive question that crossed my mind: Is there a (reasonable) way to detect the situations that I think of as a traps ("fatal interpreter error", "results in GIL deadlock")? At least when NDEBUG is not defined? Then error out with a helpful runtime error instead of leaving people stuck with potentially extremely time-consuming debugging jobs to pin-point the root cause?

docs/upgrade.rst Outdated Show resolved Hide resolved
@henryiii
Copy link
Collaborator

Is there a (reasonable) way to detect the situations that I think of as a traps ("fatal interpreter error", "results in GIL deadlock")?

That's a new feature. We can work on it for 2.11 if desired. 2.10.1 is releasing some iterative improvements, not fixes for perceived traps that have existed for years. This is not the only trap in pybind11, or Python, or C++ for that matter.

Please approve this improvement before 1:00 PM Eastern tomorrow, or this will be postponed to the next release.

@henryiii
Copy link
Collaborator

@rwgk Still no approval on this?

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll flip the default on the smart_holder branch when merging this there, but that's easy.

CMakeLists.txt Outdated Show resolved Hide resolved
docs/upgrade.rst Outdated Show resolved Hide resolved
docs/upgrade.rst Outdated Show resolved Hide resolved
include/pybind11/gil.h Outdated Show resolved Hide resolved
@rwgk
Copy link
Collaborator

rwgk commented Oct 26, 2022

@henryiii @Skylion007 could you please review this PR again? I made quite a few changes.

Some miscellaneous notes:

  • I removed the "We plan to add an example" promise, because we are not removing the non-simple code at this time. I think it's fine to keep the non-simple code around for a while, and maybe later warn people that it will be removed, along with a suggestion to open a bug if they need help migrating the dissociation feature to their own project.

  • The old #ifdef logic in gil.h assumed that PyPy implies WITH_THREAD. Since we are only testing with PyPy 3.7 or higher, that is certain to be true, but someone with Python 3.6 --without-threading could now define PYBIND11_SIMPLE_GIL_MANAGEMENT, therefore that assumption is no longer valid and I changed the logic accordingly.

  • I fixed up the !WITH_THREAD code in gil.h and tested it locally by temporarily monkey-patching the #ifdef logic. That way I discovered that the disarm()s were private. That code was definitely never used.

  • I fixed up the constructors in the simple version to use member initializers, transferred from Strip down gil.h to the simple version only. #4218. The change to test_embed/test_interpreter.cpp was also transferred from Strip down gil.h to the simple version only. #4218. See there for the discussion why simply removing the REQUIRE(has_pybind11_internals_static()); is the right choice.

While I was at it, I added this to common.h, so that we don't forget to remove the !WITH_THREAD code when we drop Python 3.6:

// Reminder: WITH_THREAD is always defined if PY_VERSION_HEX >= 0x03070000


# endif // PYBIND11_SIMPLE_GIL_MANAGEMENT

#else // WITH_THREAD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Do we follow this style anywhere else? I find it quite confusing, because it reads like this is the "WITH_THREAD" block, while it's actually the not WITH_THREAD block. For things like this, they should be enforced (such as the comment at the end of a namespace is forced). I don't think it can, because this style can't be expanded to an elif chain.

Also this block goes away when we drop 3.6, right? We should be able to deduce that when we remove the WITH_THREAD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Do we follow this style anywhere else?

In the wild I see negations, or not. I don't think it's consistent in the world, and that's clearly confusing.

What do we want to follow here?

I decided even with that ambiguity, having the markers is more helpful than not, because the first scope (non-simple) is quite large.

Ultimately, something like this would be best:

// No ifdefs
gil_scoped_aquire_non_simple ...
...
gil_scoped_acquire_simple ...

#if defined PYBIND11_SIMPLE_GIL_MANAGEMENT
using gil_scoped_acquire = gil_scoped_acquire_simple;
#else
using gil_scoped_acquire = gil_scoped_acquire_non_simple;
#endif

Although if we push it there I'd want to find better names than simple and non-simple.

Please let me know what you prefer:

  • What's here.
  • Different style.
  • The (or a different) fully formal solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it as is (that's why it's a nit, can be ignored). This code will simplify later when we remove 3.6. I'm just not a fan of commented out code, it can be confusing (on the else, it's ambiguous), and it can easy become wrong if things get moved around (no test for it matching up). I'd rather use indentation and then visually check, just like I already have to do if I don't trust comments like this (which I don't), and everyone has to do on most other if/else blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, something like this would be best

This would be wasteful. You'd be processing and compiling both sides, while you'd only use one. Unless we added gil_scoped_acquire_simple/gil_scoped_acquire_notsimple to our API, which I don't think we want to do, then it's much harder to remove the complex one later.

docs/upgrade.rst Outdated
``py::gil_scoped_acquire`` & ``py::gil_scoped_release`` in pybind11 versions
< v2.10.1 do not support nested access. In v2.10.1, a configuration option
``PYBIND11_SIMPLE_GIL_MANAGEMENT`` was added, defaulting to ``OFF``. The
simpler implementations support nested access, but do not support dissociation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
simpler implementations support nested access, but do not support dissociation
simpler implementations support nested access, but do not support dissociation

I'm a little worried about promising "nested access is supported" when we have exactly 0 tests for it. Possibly "should support nested access" for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a huge difference: the simple version is just a pair of calls to PyEval_SaveThread() and PyEval_RestoreThread(). Those are exhaustively exercised where they are implemented, in core Python.

What you are asking for is similar to "Py_INCREF() should be supported."

The non-simple version is a very complicated implementation that does not have corresponding unit tests in the project that introduces it (i.e. in pybind11).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still doesn't mean we know it works in pybind11 - we didn't add tests. I've been at this too long to believe we can promise untested features. There could very well be some weird behavior on some weird compiler that is pybind11 specific. I'd prefer the word "should" be included if we don't test - we can't promise without tests. If we didn't test Py_INCREF(), I'd put "should" on that too (though that would be on the pybind11 wrapping, not the upstream feature - that would be more like adding a py::incref() and not testing it and promising it works).

I'd say it probably works, but we did not earn the right to say it works with certainty - that requires a pybind11 test.

Copy link
Collaborator

@henryiii henryiii Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also agree it's more likely to work than dissociation without tests - but to be fair, there are at least two major projects with thousands of users each that use this feature in pybind11, while there are no nested access pybind11 users at all yet - the feature wasn't supported / didn't work before.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: this is also a nit that you have permission to ignore - I think it should be "should", but it's not going to make or break anything. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without using this change, the above would not work at all, or just be flaky (probably when threading)?

I'd much rather have a test than runs in pybind11's CI for something we promise, but okay, downstream works too (and is no worse what what we used for the dissociate feature in the past, apparently).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my comment above about nesting was supposed to be more of a question rather than a statement, but I left off a question mark.

That is the simplest form of nesting, and that should work both with the "non-simple" and "simple" forms of gil_scoped_{acquire,release}.

The real test is nesting where we combine both the pybind11 apis and the CPython PyGILState_Ensure, as in
#1276 (comment)

Copy link
Collaborator

@henryiii henryiii Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @wjakob in that thread, the custom implementation is supposed to be compatible (at least it was as of 2.7? Maybe not now). The issue that prompted this is pytorch/pytorch#83101 - that's why we started on this "simple version" in the first place. There it looks exactly like what you have above, except it is running a bunch of Python to import PyTorch inside the "do stuff here" part. Maybe it's due to raw CPython usage inside what it's importing. Or maybe just the simple release then acquire is broken.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henryiii I'm thinking we could unblock the release by completely omitting this section here. I think this PR is otherwise good to go as-is: it's nice to have the extra option, but because we don't have a reproducer (in the CI) for the reported problem, it would be premature to say anything about the future. For this release, we could just briefly mention this PR in the changelog, like any other PR.

In summary: Release first, continue work on a reproducer, make new plans based on the outcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea (and why I didn't have a timeline in the original version). This is still a nice option, having it in a released version will make it much easier for downstream projects to test it out, and even if our complex version is perfect or fixable, I rather like that the simple version is also already there for PyPy. We can announce upgrade plans in 2.11.0 if we go with it.

docs/upgrade.rst Outdated
(``py::gil_scoped_release(true)``). In pybind11 2.11, we plan to change the
default to ``ON``, to avoid pitfalls of the implementations with dissociation
(see #4216 for more information). Note that the dissociation feature is very
rarely used and not exercised in any pybind11 unit tests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so is nested access. ;)

Ralf W. Grosse-Kunstleve added 4 commits October 29, 2022 17:59
…ess likely to mask unrelated ThreadSanitizer issues in the future).
…ANAGEMENT`

For the tests in the github CI this does not matter, because
`PYBIND11_SIMPLE_GIL_MANAGEMENT` is always defined from the command line,
but when monkey-patching common.h locally, it matters.
…ting `DEADLOCK`, additionally exercised via added `intentional_deadlock()`
@rwgk
Copy link
Collaborator

rwgk commented Oct 30, 2022

I added another commit: ea8b132

  • More sophisiticated _run_in_process() implementation, clearly reporting DEADLOCK, additionally exercised via added intentional_deadlock()

This is after learning the hard way that reducing to debug deadlocks is a silly idea. Long story omitted.

I'm hoping the CI is green again, except for expected DEADLOCK messages.

If that's true we need to decide:

  • Merge this PR and live with the DEADLOCK messages until we get to the bottom of that.
  • Fork this PR: 1. undo the changes in test_gil_scoped.cpp & test_gil_scoped.py so that we don't increase the rate of "flakes" (uhm ... deadlocks we used to look at as flakes), 2. a new one with a copy of what's currently here.

I'll torture github CI a few times to see how much noise the DEADLOCKS create.

@rwgk
Copy link
Collaborator

rwgk commented Oct 30, 2022

The last CI run was green except one DEADLOCK (CI / 🐍 pypy-3.7 • windows-2022 • x64).

I'll move on with the plan explained previously: repeatedly run the CI to estimate the DEADLOCK rate.

…` compatibility.

```
>       ForkingPickler(file, protocol).dump(obj)
E       TypeError: cannot pickle 'PyCapsule' object
```

Observed with all Windows builds including mingw but not PyPy, and macos-latest with Python 3.9, 3.10, 3.11 but not 3.6.
@rwgk
Copy link
Collaborator

rwgk commented Oct 30, 2022

The last force push only changed the description of the last commit for accuracy, everything else is still the same.

@rwgk
Copy link
Collaborator

rwgk commented Oct 30, 2022

@henryiii wrote:

I'm okay to add it to 2.10.1 now as long as it's not adding any flakiness to our test suite.

I'm glad you asked: with the latest version the "flakes" — which really were deadlocks all the while — will not pollute the CI results anymore at all. That's clearly an improvement all by itself. But it also sets us up for harvesting deadlock information from the CI logs. After this PR is merged, we can harvest from all CI logs, to get good statistics. I'm explaining more in the revised PR description.

But I just see this PR is triggering a new flake, observed twice thus far, both times with PYBIND11_SIMPLE_GIL_MANAGEMENT=ON, see below.

Maybe just Python 3.6? Maybe that has a bug that the complex version sidesteps?

Thinking at the moment: let's get this merged, monitor those flakes for a few days, then decide what to do.

Python 3.6 macos-latest x64:

2022-10-30T04:42:01.4007310Z Fatal Python error: This thread state must be current when releasing
2022-10-30T04:42:01.4007800Z
2022-10-30T04:42:01.4008990Z
2022-10-30T04:42:01.4009290Z ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-10-30T04:42:01.4014100Z Current thread 0x00007000077ce000 (most recent call first):
2022-10-30T04:42:01.4014750Z test_embed is a Catch v2.13.9 host application.
2022-10-30T04:42:01.4044460Z
2022-10-30T04:42:01.4045410Z Run with -? for options
2022-10-30T04:42:01.4046580Z Thread 0x0000000110b5a600 (most recent call first):

Python 3.6 windows-2022 x64:

MSBuild version 17.3.1+2badb37d1 for .NET Framework
  Checking File Globs
  external_module.vcxproj -> D:\a\pybind11\pybind11\tests\test_embed\external_module.cp36-win_amd64.pyd
  test_embed.vcxproj -> D:\a\pybind11\pybind11\tests\test_embed\Debug\test_embed.exe
CUSTOMBUILD : Fatal Python error : This thread state must be current when releasing [D:\a\pybind11\pybind11\tests\test_embed\cpptest.vcxproj]
  
  Current thread 0x000014e4 (most recent call first):
  
  Thread 0x00001b10 (most recent call first):
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(247,5): error MSB8066: Custom build for 'D:\a\pybind11\pybind11\CMakeFiles\bf7d116f11890487c30aedc390ae4de7\cpptest.rule;D:\a\pybind11\pybind11\tests\test_embed\CMakeLists.txt' exited with code -1073740791. [D:\a\pybind11\pybind11\tests\test_embed\cpptest.vcxproj]
Error: Process completed with exit code 1.

@henryiii
Copy link
Collaborator

I agree then, merge and watch sounds fine. We are not making pybind11 itself any worse and it's just a new experimental option.

@rwgk
Copy link
Collaborator

rwgk commented Oct 30, 2022

I agree then, merge and watch sounds fine. We are not making pybind11 itself any worse and it's just a new experimental option.

Thanks! Merging now, to then integrate into smart_holder. Having this Google internally will make it easier to get help debugging the deadlocks.

@rwgk rwgk merged commit b07d08f into pybind:master Oct 30, 2022
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 30, 2022
@j05070415
Copy link

j05070415 commented Oct 30, 2022 via email

@rwgk rwgk mentioned this pull request Oct 30, 2022
henryiii added a commit that referenced this pull request Oct 31, 2022
* Add option to force the use of the PYPY GIL scoped acquire/release logic to support nested gil access, see #1276 and pytorch/pytorch#83101

* Apply suggestions from code review

* Update CMakeLists.txt

* docs: update upgrade guide

* Update docs/upgrade.rst

* All bells & whistles.

* Add Reminder to common.h, so that we will not forget to purge `!WITH_THREAD` branches when dropping Python 3.6

* New sentence instead of semicolon.

* Temporarily pull in snapshot of PR #4246

* Add `test_release_acquire`

* Add more unit tests for nested gil locking

* Add test_report_builtins_internals_keys

* Very minor enhancement: sort list only after filtering.

* Revert change in docs/upgrade.rst

* Add test_multi_acquire_release_cross_module, while also forcing unique PYBIND11_INTERNALS_VERSION for cross_module_gil_utils.cpp

* Hopefully fix apparently new ICC error.

```
2022-10-28T07:57:54.5187728Z -- The CXX compiler identification is Intel 2021.7.0.20220726
...
2022-10-28T07:58:53.6758994Z icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
2022-10-28T07:58:54.5801597Z In file included from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/type_caster_base.h(15),
2022-10-28T07:58:54.5803794Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../cast.h(15),
2022-10-28T07:58:54.5805740Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../attr.h(14),
2022-10-28T07:58:54.5809556Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/class.h(12),
2022-10-28T07:58:54.5812154Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/pybind11.h(13),
2022-10-28T07:58:54.5948523Z                  from /home/runner/work/pybind11/pybind11/tests/cross_module_gil_utils.cpp(13):
2022-10-28T07:58:54.5949009Z /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/internals.h(177): error #2282: unrecognized GCC pragma
2022-10-28T07:58:54.5949374Z       PYBIND11_TLS_KEY_INIT(tstate)
2022-10-28T07:58:54.5949579Z       ^
2022-10-28T07:58:54.5949695Z
```

* clang-tidy fixes

* Workaround for PYPY WIN exitcode None

* Revert "Temporarily pull in snapshot of PR #4246"

This reverts commit 23ac16e.

* Another workaround for PYPY WIN exitcode None

* Clean up how the tests are run "run in process" Part 1: uniformity

* Clean up how the tests are run "run in process" Part 2: use `@pytest.mark.parametrize` and clean up the naming.

* Skip some tests `#if defined(THREAD_SANITIZER)` (tested with TSAN using the Google-internal toolchain).

* Run all tests again but ignore ThreadSanitizer exitcode 66 (this is less likely to mask unrelated ThreadSanitizer issues in the future).

* bug fix: missing common.h include before using `PYBIND11_SIMPLE_GIL_MANAGEMENT`

For the tests in the github CI this does not matter, because
`PYBIND11_SIMPLE_GIL_MANAGEMENT` is always defined from the command line,
but when monkey-patching common.h locally, it matters.

* if process.exitcode is None: assert t_delta > 9.9

* More sophisiticated `_run_in_process()` implementation, clearly reporting `DEADLOCK`, additionally exercised via added `intentional_deadlock()`

* Wrap m.intentional_deadlock in a Python function, for `ForkingPickler` compatibility.

```
>       ForkingPickler(file, protocol).dump(obj)
E       TypeError: cannot pickle 'PyCapsule' object
```

Observed with all Windows builds including mingw but not PyPy, and macos-latest with Python 3.9, 3.10, 3.11 but not 3.6.

* Add link to potential solution for WOULD-BE-NICE-TO-HAVE feature.

* Add `SKIP_IF_DEADLOCK = True` option, to not pollute the CI results with expected `DEADLOCK` failures while we figure out what to do about them.

* Add COPY-PASTE-THIS: gdb ... command (to be used for debugging the detected deadlock)

* style: pre-commit fixes

* Do better than automatic pre-commit fixes.

* Add `PYBIND11_SIMPLE_GIL_MANAGEMENT` to `pytest_report_header()` (so that we can easily know when harvesting deadlock information from the CI logs).

Co-authored-by: Arnim Balzer <[email protected]>
Co-authored-by: Henry Schreiner <[email protected]>
Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Oct 31, 2022
@Chekov2k Chekov2k deleted the pypy_gil branch March 3, 2023 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] TSAN error logs
8 participants