Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115832: Fix instrumentation version mismatch during interpreter shutdown #115856

Merged
merged 5 commits into from
Mar 4, 2024

Conversation

swtaarrs
Copy link
Member

In 0749244d13412d, I introduced a bug to interpreter_clear(): it sets interp->ceval.instrumentation_version to 0, without making the corresponding change to tstate->eval_breaker (which holds a thread-local copy of the version). After this happens, Python code can still run due to object finalizers during a GC, and this version check in bytecodes.c
will see a different result than this one in instrumentation.c, causing an infinite loop.

The fix itself is straightforward, and is what I should've done in interpreter_clear() in the first place: also clear tstate->eval_breaker when clearing interp->ceval.instrumentation_version. I also restored a comment that I'm not sure why I deleted in the original commit.

To make bugs of this type less likely in the future, I changed instrumentation.c:global_version() to read the version from a PyThreadState* rather than a PyInterpreterState*, so it's reading the version from the same location as the interpreter loop. This had some fan-out effects on its transitive callers, although most of them already had the current tstate available.

…ter shutdown

In python/cpython@0749244d13412d, I introduced a bug to `interpreter_clear()`: it
sets `interp->ceval.instrumentation_version` to 0, without making the
corresponding change to `tstate->eval_breaker` (which holds a thread-local copy
of the version). After this happens, Python code can still run due to object
finalizers during a GC, and [this version check in
bytecodes.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/bytecodes.c#L147-L152)
will see a different result than [this one in
instrumentation.c](https://github.com/python/cpython/blob/4ee6bdfbaa792a3aa93c65c2022a89bd2d1e0894/Python/instrumentation.c#L894-L895),
causing an infinite loop.

The fix itself is straightforward, and is what I should've done in
`interpreter_clear()` in the first place: also clear `tstate->eval_breaker`
when clearing `interp->ceval.instrumentation_version`. I also restored a
comment that I'm not sure why I deleted in the original commit.

To make bugs of this type less likely in the future, I changed
`instrumentation.c:global_version()` to read the version from a
`PyThreadState*` rather than a `PyInterpreterState*`, so it's reading the
version from the same location as the interpreter loop. This had some fan-out
effects on its transitive callers, although most of them already had the
current tstate availale.

- Issue: pythongh-115832
@swtaarrs
Copy link
Member Author

I think this one can skip news (since it should've been part of the original commit), but let me know if anyone disagrees.

@swtaarrs
Copy link
Member Author

The test failures look real and related to this change. I'm investigating.

@swtaarrs swtaarrs marked this pull request as draft February 23, 2024 18:50
…nce it doesn't have an up-to-date monitoring version
@swtaarrs swtaarrs marked this pull request as ready for review February 23, 2024 19:37
@swtaarrs swtaarrs linked an issue Feb 27, 2024 that may be closed by this pull request
@swtaarrs swtaarrs requested a review from colesbury February 27, 2024 22:42
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me, but there are some merge conflicts that need to be resolved.

I am a bit unsure about the switch of global_version to use the PyThreadState's eval_breaker instead of the interpreter's instrumentation_version. The use of instrumentation_version seemed a bit more natural as the authoritative source, and a smaller bug fix change seems preferable too.

Would it be possible to add an assertion in global_version() that the two are the same? With the GIL, the setting of the thread and interpreter values should appear atomic. With the GIL disabled, I think we'd want a stop-the-world pause when we enable instrumentation.

@swtaarrs
Copy link
Member Author

I am a bit unsure about the switch of global_version to use the PyThreadState's eval_breaker instead of the interpreter's instrumentation_version. The use of instrumentation_version seemed a bit more natural as the authoritative source, and a smaller bug fix change seems preferable too.

Would it be possible to add an assertion in global_version() that the two are the same?

Makes sense, and yeah that should be a straightforward change.

@swtaarrs
Copy link
Member Author

swtaarrs commented Mar 1, 2024

The latest round of merge conflicts was from #116013, but this should be good to go again.

@colesbury colesbury added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 1, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 7a9c81a 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 1, 2024
@colesbury colesbury merged commit 0adfa84 into python:main Mar 4, 2024
116 of 118 checks passed
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
…ter shutdown (python#115856)

A previous commit introduced a bug to `interpreter_clear()`: it set
`interp->ceval.instrumentation_version` to 0, without making the corresponding
change to `tstate->eval_breaker` (which holds a thread-local copy of the
version). After this happens, Python code can still run due to object finalizers
during a GC, and the version check in bytecodes.c will see a different result
than the one in instrumentation.c causing an infinite loop.

The fix itself is straightforward: clear `tstate->eval_breaker` when clearing
`interp->ceval.instrumentation_version`.
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
…ter shutdown (python#115856)

A previous commit introduced a bug to `interpreter_clear()`: it set
`interp->ceval.instrumentation_version` to 0, without making the corresponding
change to `tstate->eval_breaker` (which holds a thread-local copy of the
version). After this happens, Python code can still run due to object finalizers
during a GC, and the version check in bytecodes.c will see a different result
than the one in instrumentation.c causing an infinite loop.

The fix itself is straightforward: clear `tstate->eval_breaker` when clearing
`interp->ceval.instrumentation_version`.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…ter shutdown (python#115856)

A previous commit introduced a bug to `interpreter_clear()`: it set
`interp->ceval.instrumentation_version` to 0, without making the corresponding
change to `tstate->eval_breaker` (which holds a thread-local copy of the
version). After this happens, Python code can still run due to object finalizers
during a GC, and the version check in bytecodes.c will see a different result
than the one in instrumentation.c causing an infinite loop.

The fix itself is straightforward: clear `tstate->eval_breaker` when clearing
`interp->ceval.instrumentation_version`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coverage.py test suite failure since commit 0749244d134
4 participants