-
-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sh deadlocking in child fork process due to logging #607
Comments
Thanks for the detailed report! The docs you linked to state that:
However, that's not actually what happens in this case, since the Out of curiosity, what leads you to think that the sh library's background thread causes this problem? |
These pre/post fork calls are made because I can't follow how
I had a stacktrace which showed the child process stuck inside a logging related lock, and the background thread is also locking. I was able to reproduce this without |
Thanks for the repro script and report, it was very helpful.
We can't control if the application using sh is multi-threaded or not, so sh must assume multi-threading in any case. I believe the main risk that sh needs to be concerned about is holding a lock before fork, and then expecting the lock to be opened in the child. The script that you posted demonstrates this problem. Most of what happens in the child between fork and exec is very minimal and mostly about closing down file descriptors, setting process ids, and setting signals. There is very little risk for the child to mess with anything threading or lock related, unless of course a user uses an escape hatch like I'm inclined to say that the fix here is "don't do that". Although I can see how expecting to use logging within the child, before exec, could be a reasonable thing to expect. But it's not sh's responsibility to let users know that the logging module uses locks. I'm not even sure how we could communicate that risk back to the user, outside of a documentation change? The only alternative that I see is to disable sh's logging altogether by default, unless a |
In Python 3.12, a DeprecationWarning is issued in There is a well-referenced discussion linked in the docs, but the gist is: if a process that has multiple threads is forking, only the C API functions that are "async-signal-safe" can be used in the child1. CPython does not guarantee it will only use those functions (e.g. it can call malloc), therefore forking while having multiple threads is unsafe. I don't know what should be done about this (can Now, does CPython actually call any unsafe functions in the child specifically while strace -o threaded_fork.strace -ff python -c "import sh; sh.true()"
cat $(grep -l 'execve("/usr/bin/true' threaded_fork.strace.*)
# rm threaded_fork.strace.* On Linux running CPython 3.11.6 I get the following trace: trace
Out of those calls:
Basically, this was a long way of saying that while the child code after the fork is indeed minimal, it's still unsafe according to POSIX. Whether that's something that needs addressing is another matter -- it seems to work fine in practice and that's what really matters, but perhaps it's something you might want to look into. Cheers, and thank you for this amazing library! Footnotes
|
@micha030201 Thank you for the deep dive and the excellent summary! |
Curious if this is related to the issue I just saw, after upgrading to the new PyCharm 2024.1. Code using
Appears to be reading the sid/pgid immediately after a I can workaround this by downgrading PyCharm or not activating the debugger. But it seems like there is something unhappy in the intersection of async/fork/sh. |
The deprecation warning is displayed when using sh from within pytest:
It's possible to disable the warning, but i'm not sure I want to do that just yet. Hopefully someday forking can be configured off in the sh module. |
Has anyone used os.posix_spawn as a replacement to fork+exec? It might work as a replacement with minimal changes. |
For the record, everything here would need to be converted somehow to posix_spawn |
I recently had an issue where some tests I have which were invoking
sh
with_bg=True
were deadlocking.After investigating, I noticed this was happening inside the
sh
library, and related to logging inside of an at-fork handler. This is a minimal reproducible example of the issue:Hitting this issue does require a bit of scheduling bad luck, I think the background thread needs to be interrupted while the lock is held. My guess is that the log line is being formatted. Then
fork()
needs to be called from the main thread.In practice though this seems to not be that rare. In my testing the above reproduction hits it every time, with only 10 iterations. I'm guessing some of the objects being logged take some time to format? I also think instead of using
os.register_at_fork()
you could shoot yourself in the foot via thepreexec_fn
argument.I am not an expert on python internals - but generally calling
fork()
from a multi-threaded application has some pretty concerning risks. It's realistically not going to be possible to avoid the issue entirely since users will always be able to cause this themselves, but I think thesh
library's background thread is making this problem far more likely than it needs to be.Can this thread avoid logging / other lock-holding-operations?
I think it's also worth noting that at-fork handlers were only added in python3.7 - so they may not be that common yet
The text was updated successfully, but these errors were encountered: