Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Specialize CALL_KW in free-threaded builds #127713

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Dec 6, 2024

The CALL_KW family was already thread-safe. Only one change to fix a bug in _PY_FRAME_KW was needed:

_PY_FRAME_KW pushes a pointer to the new frame onto the stack for consumption by the next uop. When pushing the frame fails, we do not want to push the result, NULL, to the stack because it is not a valid stackref. This works in the default build because PyStackRef_NULL and NULL are the same value, so the PyStackRef_XCLOSE() in the error handler ignores it. In the free-threaded build the values are not the same, causing PyStackRef_XCLOSE() to decref a null pointer.

Single-threaded performance

Performance looks neutral on both builds. I think this is still worth merging to keep the two builds consistent, however.

Scalability

Scalability looks roughly unchanged:

                    Base         PR
object_cfunction    1.5x slower  1.4x slower
cmodule_function    1.6x slower  1.6x slower
mult_constant      12.8x faster  13.2x faster
generator          13.5x faster  10.1x faster
pymethod            1.9x slower  1.9x slower
pyfunction         10.7x faster  13.4x faster
module_function     2.1x slower  2.0x slower
load_string_const  14.2x faster  13.1x faster
load_tuple_const   13.6x faster  13.2x faster
create_pyobject    15.5x faster  15.7x faster
create_closure     12.3x faster  13.5x faster
create_dict        13.3x faster  12.9x faster
thread_local_read   4.0x slower  3.8x slower

Thread safety

The specialized instructions are composed of the follow uops, whose thread-safety is documented below.

  • _CALL_KW_NON_PY - Tuples are immutable; it's fine to read their size non-atomically.
  • _CHECK_FUNCTION_VERSION_KW - This uop guards that the top of the stack is a function and that its version matches the version stored in the inline cache. Instructions assume that if the guard passes, the version, and any properties verified by the version, will not change for the remainder of the instruction execution, assuming there are no escaping calls in between the guard and the code that relies on the guard. This property is preserved in free-threaded builds: the world is stopped whenever a function's version changes.
  • _CHECK_IS_NOT_PY_CALLABLE_KW - Performs type checks.
  • _CHECK_METHOD_VERSION_KW - This loads a function from a PyMethodObject and guards that its version matches what is stored in the cache. PyMethodObjects are immutable; their fields can be accessed non-atomically. The thread safety of function version guards was already documented above.
  • _CHECK_PEP_523 - Thread safety was addressed as part of CALL.
  • _CHECK_PERIODIC - Thread safety was previously addressed as part of the 3.13 release.
  • _EXPAND_METHOD_KW - Only loads from PyMethodObjects and performs type checks.
  • _PUSH_FRAME - Only manipulates tstate->current_frame and fields that are not read by other threads.
  • _PY_FRAME_KW - Loads immutable fields (tuple size, code flags) or fields that require stopping the world to modify (function code) and uses thread-safe APIs.
  • _SAVE_RETURN_OFFSET - Stores only to the frame's return_offset which is not read by other threads.

Specialization only looks at properties that require stopping the world to update (i.e. instance type, eval_frame, func_code, and the function version) and contains no escaping calls.

mpage added 2 commits December 5, 2024 22:12
`_PY_FRAME_KW` pushes a pointer to the new frame onto the stack for
consumption by the next uop. When pushing the frame fails, we do not
want to push the result, `NULL`, to the stack because it is not
a valid stackref. This works in the default build because `PyStackRef_NULL`
 and `NULL` are the same value, so the `PyStackRef_XCLOSE()` in the error
handler ignores it. In the free-threaded build the values are not the same;
`PyStackRef_XCLOSE()` will attempt to decref a null pointer.
@mpage
Copy link
Contributor Author

mpage commented Dec 7, 2024

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit b22403e 🤖

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 CentOS9 NoGIL Refleaks PR

@mpage mpage requested a review from colesbury December 7, 2024 04:10
@mpage mpage marked this pull request as ready for review December 7, 2024 04:10
@mpage mpage requested a review from markshannon as a code owner December 7, 2024 04:10
Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mpage
Copy link
Contributor Author

mpage commented Dec 11, 2024

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit aef38b1 🤖

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 CentOS9 NoGIL Refleaks PR

@mpage mpage merged commit c84928e into python:main Dec 11, 2024
57 checks passed
@mpage mpage deleted the gh-115999-tlbc-call-kw branch December 11, 2024 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants