Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ports/Python3: crash in child after fork #25263

Closed
oskar-skog opened this issue Nov 3, 2024 · 16 comments · Fixed by #25377
Closed

Ports/Python3: crash in child after fork #25263

oskar-skog opened this issue Nov 3, 2024 · 16 comments · Fixed by #25377
Labels
bug Something isn't working ports

Comments

@oskar-skog
Copy link
Contributor

courage:~ $ python3
Python 3.13.0 (main, Nov  3 2024, 11:49:09) [GCC 13.2.0] on serenityos
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getpid()
44
>>> os.fork()
45
>>> Fatal Python error: _PyRecursiveMutex_Unlock: unlocking a recursive mutex th
at is not owned by the current thread
Python runtime state: initialized

Current thread 0x000000000000002c (most recent call first):
  File "<stdin>-2", line 1 in <module>
  File "/usr/local/lib/python3.13/_pyrepl/main.py", line 30 in interactive_conso
le
  File "/usr/local/lib/python3.13/_pyrepl/__main__.py", line 6 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

>>> os.getpid()
44
>>> 

Backtrace:

==== Thread #0 (TID 45) ====
0x00000019c923502b: [/usr/lib/libsystem.so] syscall2 +0xb (syscall.cpp:25 => syscall.cpp:24)
0x0000001ddcb6a45d: [/usr/lib/libc.so] abort +0x26 (stdlib.cpp:387)
0x0000000d3b3f5303: [/usr/local/lib/libpython3.13.so.1.0] fatal_error.cold +0x4 (pylifecycle.c:3032 => pylifecycle.c:3178)
0x0000000d3b623a33: [/usr/local/lib/libpython3.13.so.1.0] _Py_FatalErrorFunc +0x33 (pylifecycle.c:3263)
0x0000000d3b6133fe: [/usr/local/lib/libpython3.13.so.1.0] _PyRecursiveMutex_Unlock +0x7e (lock.c:385)
0x0000000d3b668679: [/usr/local/lib/libpython3.13.so.1.0] PyOS_AfterFork_Child.localalias +0x89 (posixmodule.c:680)
0x0000000d3b6689b6: [/usr/local/lib/libpython3.13.so.1.0] os_fork +0xc6 (posixmodule.c:8067)
0x0000000d3b4ca5e0: [/usr/local/lib/libpython3.13.so.1.0] cfunction_vectorcall_NOARGS +0x70 (methodobject.h:50)
0x0000000d3b463227: [/usr/local/lib/libpython3.13.so.1.0] PyObject_Vectorcall +0x57 (pycore_call.h:168)
0x0000000d3b3f74df: [/usr/local/lib/libpython3.13.so.1.0] _PyEval_EvalFrameDefault +0x1bff (generated_cases.c.h:813)
0x0000000d3b5bf8fc: [/usr/local/lib/libpython3.13.so.1.0] PyEval_EvalCode +0x24c (pycore_ceval.h:119 => ceval.c:1806)
0x0000000d3b62948f: [/usr/local/lib/libpython3.13.so.1.0] run_eval_code_obj +0x7f (pythonrun.c:1323)
0x0000000d3b629735: [/usr/local/lib/libpython3.13.so.1.0] run_mod +0x185 (pythonrun.c:1408)
0x0000000d3b629bc6: [/usr/local/lib/libpython3.13.so.1.0] PyRun_InteractiveOneObjectEx +0x196 (pythonrun.c:282)
0x0000000d3b62ba45: [/usr/local/lib/libpython3.13.so.1.0] _PyRun_InteractiveLoopObject +0xa5 (pythonrun.c:130)
0x0000000d3b62c175: [/usr/local/lib/libpython3.13.so.1.0] PyRun_AnyFileExFlags +0xb5 (pythonrun.c:71)
0x0000000d3b63a1be: [/usr/local/lib/libpython3.13.so.1.0] sys__baserepl +0x2e (sysmodule.c:2428)
0x0000000d3b4ca5e0: [/usr/local/lib/libpython3.13.so.1.0] cfunction_vectorcall_NOARGS +0x70 (methodobject.h:50)
0x0000000d3b463227: [/usr/local/lib/libpython3.13.so.1.0] PyObject_Vectorcall +0x57 (pycore_call.h:168)
0x0000000d3b3f74df: [/usr/local/lib/libpython3.13.so.1.0] _PyEval_EvalFrameDefault +0x1bff (generated_cases.c.h:813)
0x0000000d3b5bf8fc: [/usr/local/lib/libpython3.13.so.1.0] PyEval_EvalCode +0x24c (pycore_ceval.h:119 => ceval.c:1806)
0x0000000d3b5b9f0f: [/usr/local/lib/libpython3.13.so.1.0] builtin_exec +0x41f (bltinmodule.c:1145)
0x0000000d3b4ca217: [/usr/local/lib/libpython3.13.so.1.0] cfunction_vectorcall_FASTCALL_KEYWORDS +0x67 (methodobject.h:50)
0x0000000d3b463227: [/usr/local/lib/libpython3.13.so.1.0] PyObject_Vectorcall +0x57 (pycore_call.h:168)
0x0000000d3b3f74df: [/usr/local/lib/libpython3.13.so.1.0] _PyEval_EvalFrameDefault +0x1bff (generated_cases.c.h:813)
0x0000000d3b6531a1: [/usr/local/lib/libpython3.13.so.1.0] pymain_run_module +0xd1 (main.c:349)
0x0000000d3b654424: [/usr/local/lib/libpython3.13.so.1.0] pymain_run_python.constprop.0 +0xfa4 (main.c:574)
0x0000000d3b6547d6: [/usr/local/lib/libpython3.13.so.1.0] Py_BytesMain +0x56 (main.c:775)
0x00000005603c4654: [/usr/local/bin/python3.13] _entry +0x24 (crt0.cpp:47)
@oskar-skog
Copy link
Contributor Author

@linusg Didn't you say pyrepl was disabled?

@linusg
Copy link
Member

linusg commented Nov 3, 2024

I did, and it is. Please read the code the traceback is pointing to: https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/_pyrepl/main.py#L30

@linusg linusg added bug Something isn't working ports labels Nov 3, 2024
@oskar-skog
Copy link
Contributor Author

oskar-skog commented Nov 10, 2024

It seems that at least Python 3.13 relies on pthread_self returning the same value in both the parent and the child process.
PyThread_get_thread_ident_ex

This is the case on: Linux, Hurd, FreeBSD, OpenBSD, NetBSD, MidnightBSD, Cygwin, Haiku, newer versions of Solaris, and macOS
But not on SerenityOS

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    pthread_t parent = pthread_self();
    if (fork() == 0) {
        pthread_t child = pthread_self();
        if (pthread_equal(parent, child))
            puts("parent == child");
        else
            puts("parent != child");
    }
    return 0;
}

Curiously, POSIX doesn't seem to mention if the value returned by pthread_self should be inherited by the child process.
pthread_self
fork

@ADKaster
Copy link
Member

On serenity, pthread_self returns gettid. gettid() in unistd.cpp keeps a static, cached, thread_local variable to avoid a sys call.

We actually take special care in the fork() LibC wrapper to ensure that the behavior python is expecting here does not happen:

// https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
pid_t fork()
{
__pthread_fork_prepare();
int rc = syscall(SC_fork);
if (rc == 0) {
s_cached_tid = 0;
s_cached_pid = 0;
__pthread_fork_child();
} else if (rc != -1) {
__pthread_fork_parent();
}
__RETURN_WITH_ERRNO(rc, rc, -1);
}

The cached thread id and cached process id are cleared after-fork in the child, causing the next access to actually perform the gettid syscall and get the up to date value.

@ADKaster
Copy link
Member

"Fixing" this issue would in theory make LibCore event loops more exciting, as well as make the implementation of the Shell more tricky.

However, those pieces of software are known to "work" on non-serenity platforms that have this unfortunate behavior, so perhaps they're subtly broken as well 🤔

@ADKaster
Copy link
Member

It looks like the code that broke us is from this commit python/cpython@e21057b added in python/cpython#118523 as part of python/cpython#117657

We could probably get around this by making a patch to re-add the logic of _PyImport_ReInitLock(tstate->interp); on top of that patch.

@colesbury If you have any spare cycles, could you help us figure out if the bug is in cpython or in our OS? :D

@ADKaster
Copy link
Member

Scrolling through the cpython patch it looks like we've managed to re-incarnate an ancient Solaris 9/HP-UX 11 behavior quirk that was worked around with that _PyImport_ReInitLock function.

https://bugs.python.org/issue7242

Though I'm not sure I'd call a patch to LibC/LibPthread to avoid re-setting pthread_self() in the child after fork a 'bug fix'

@colesbury
Copy link

Yes, we can fix this upstream in CPython. Can you open an issue in https://github.com/python/cpython? You can tag me in it.

@nico
Copy link
Contributor

nico commented Nov 11, 2024

Sure, made python/cpython#126688. Thanks!

@colesbury
Copy link

Does Serenity preserve thread local storage after fork?

@colesbury
Copy link

Would you please verify that python/cpython#126692 fixes the crash on Serenity?

@oskar-skog
Copy link
Contributor Author

oskar-skog commented Nov 11, 2024

fork in Python: confirmed working
threads in Python: No regression found with threads. A threading.Lock that is locked by the parent can still be unlocked by the child after fork even though threading.get_ident() returns different values.

thread_test.py.txt
thread-test-Linux.txt (not 3.13)
thread-test-SerenityOS.txt

@ADKaster
Copy link
Member

Does Serenity preserve thread local storage after fork?

Yes, thread local storage is cloned. Instead of storing pthread data structures in %fs or other libc-allocated data, Serenity's pthread APIs mostly just map to querying the kernel about its first-class Thread classes that each Process class owns, using the thread ID.

Thread-local storage is allocated and stored in %fs and other thread-pointer-registers. Just not the "info about the current thread" structure musl libc and glibc have.

@oskar-skog
Copy link
Contributor Author

It's merged into main. I guess we can just update Python to 3.13.1 when it gets released?

If anyone wants to get it working right now, I believe you might get away with just commenting
out the call to Py_FatalError in _PyRecursiveMutex_Unlock

@Hendiadyoin1
Copy link
Contributor

Alternatively one could temporarily backport the change to our patch set, not sure how fast the release python release cycle is
(:yakbait:)

@oskar-skog
Copy link
Contributor Author

It's expected to be released 2024-12-03, but backporting this change should be easy, it will just get redundant later.

oskar-skog added a commit to oskar-skog/serenity that referenced this issue Nov 12, 2024
@nico nico closed this as completed in a47e63b Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ports
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants