Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker: fix crash when a worker joins after exit #56191

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

Qard
Copy link
Member

@Qard Qard commented Dec 9, 2024

If a worker has not already joined before running to completion it will join in a SetImmediateThreadsafe which could occur after the worker has already ended by other means. Mutating a JS object at that point would fail because the isolate is already disposed.

Fixes #56020 (Note that this issue is not reproducible on 22+, but is likely still technically present on later versions as it appears to be a race condition which just happened to be reproducible in the specific conditions described in the issue. The originating code has not been changed in a long time, so it's likely just a very rare condition.)

cc @addaleax I would greatly appreciate a review from you as you're the last person to touch this code...even if it's been many years now. 😅

@Qard Qard added c++ Issues and PRs that require attention from people who are familiar with C++. worker Issues and PRs related to Worker support. review wanted PRs that need reviews. request-ci Add this label to start a Jenkins CI on a PR. labels Dec 9, 2024
@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Dec 9, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 9, 2024
@nodejs-github-bot
Copy link
Collaborator

Copy link

codecov bot commented Dec 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.49%. Comparing base (56c8360) to head (6d4a4d5).
Report is 40 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #56191      +/-   ##
==========================================
+ Coverage   87.99%   88.49%   +0.50%     
==========================================
  Files         656      656              
  Lines      189103   189262     +159     
  Branches    35997    36349     +352     
==========================================
+ Hits       166402   167494    +1092     
+ Misses      15859    14976     -883     
+ Partials     6842     6792      -50     
Files with missing lines Coverage Δ
src/node_worker.cc 84.26% <100.00%> (+0.34%) ⬆️

... and 94 files with indirect coverage changes

@mcollina
Copy link
Member

mcollina commented Dec 9, 2024

cc @jasnell @joyeecheung

src/node_worker.cc Outdated Show resolved Hide resolved
@addaleax
Copy link
Member

addaleax commented Dec 9, 2024

Yeah, I don't think this is the right fix here ... according to the stack trace in #56020, this crash occurs while the parent thread is still running its event loop, so I'm not sure why the Isolate is already disposed? This is happening in the parent thread, not the child whose thread exited

If a worker has not already joined before running to completion
it will join in a SetImmediateThreadsafe which could occur after
the worker has already ended by other means. Mutating a JS object
at that point would fail because the isolate is already disposed.
@Qard Qard force-pushed the fix-worker-join-crash branch from 10bdd5e to 6d4a4d5 Compare December 9, 2024 17:01
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Qard
Copy link
Member Author

Qard commented Dec 9, 2024

@addaleax It's happening while the parent thread is alive, but it's the child thread that it's happening to because it's scheduling the SetImmediateThreadSafe from the StartThread of the worker after the Run completes. The worker completes its Run before main exits, schedules an immediate, and then seemingly is having its exit trigger before the immediate which results in the worker already having disposed its isolate by the time it reaches that immediate which attempts to join the thread.

@addaleax
Copy link
Member

addaleax commented Dec 9, 2024

@Qard Yeah, but the thing here is that the scheduled immediate runs on the parent thread, and it's accessing the parent Isolate, not the worker Isolate, so that should be fine?

Anyway, the current fix here does look good to me

(Oh, and: It looks like the Isolate here is terminating, i.e. TerminateExecution() has been called, and not actually been disposed of?)

@Qard
Copy link
Member Author

Qard commented Dec 9, 2024

I don't know. Just going by the logs that said "isolate disposed" in it. 🤷

@Qard Qard added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@Qard Qard added commit-queue Add this label to land a pull request using GitHub Actions. and removed needs-ci PRs that need a full CI run. review wanted PRs that need reviews. labels Dec 11, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 11, 2024
@nodejs-github-bot nodejs-github-bot merged commit 586814b into nodejs:main Dec 11, 2024
60 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 586814b

@Qard Qard deleted the fix-worker-join-crash branch December 12, 2024 06:04
targos pushed a commit that referenced this pull request Dec 13, 2024
If a worker has not already joined before running to completion
it will join in a SetImmediateThreadsafe which could occur after
the worker has already ended by other means. Mutating a JS object
at that point would fail because the isolate is already disposed.

PR-URL: #56191
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FATAL ERROR: v8::FromJust Maybe value is Nothing
7 participants