-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.0.80 opencv build hangs #844
Comments
Ok, I realize there could be one bug in cc::Build::compile_objects that caused this: Wait-thread could exit early due to error, however the spawn-thread will go on and keep spawning, thus eventually hitting a deadlock. Would open a PR shortly. Originally posted by @NobodyXu in twistedfall/opencv-rust#480 (comment) |
I'm so sorry, this is what is called race condition :) |
Fix for this is to keep wait thread running until the channel reaches eof. |
I've bisected the git tree between 1.0.79 and 1.0.80 the offending commit is ff45d42 (unsurprisingly), not sure how much help that is :) |
@twistedfall Does v1.0.80 still work fine for you as previously stated? If so then it is really strange, since v1.0.80 includes this commit. |
No, it doesn't, I'm not sure why it worked in my previous testing (I have updated that comment) |
That's strange, if v1.0.80 somehow works and then stop working suddenly, then I think it could be something else is affecting the build. |
Is there any other dependencies that also use the jobserver provided by Could be that their |
Well the build script itself uses the job server: https://github.com/twistedfall/opencv-rust/blob/master/build/generator.rs#L82, but it releases the acquired tokens, so it should be fine. But I'm reading the docs for the |
@twistedfall In your build/generator.rs you create a
When you drop a But the fds specified in the environment variable isn't changed, so you could be just referring to random fds and if it is a valid fd, then If the jobserver is passed via fifo (path to a named pipe), then it will be fine. Edit: This also explains why this can only be reproduced on Linux and MacOS, but not on Windows, since jobserver on Windows uses named fifo. |
@twistedfall I have a fork |
@twistedfall Another solution is to wrap the It should be fine given that it's only used in build-script and it will get cleaned up by OS on exits. |
That would work I think, but it looks like it's only time before another crate uses |
I could open a PR in |
This will clearly be behind a feature flag and add complexity to the existing API and implementation since it would require duplication since now |
@twistedfall I've opened a PR in jobserver-rs for this use case. |
@NobodyXu As you previously mentioned the |
Please check my comment above, it's likely this is caused by use of unsafe function jobserver::Client::from_env in opencv build.rs .
It is nit "race" in traditional sense since it is perfectly memory safe, just that the blocking nature of jobserver::Client::acquire making it a bit hard to write correct code without use of async. Beaides, it's verified that #846 does not fix this, so it's something else. |
Thanks a lot for that, this should be the best solution! |
@NobodyXu I have tried switching to |
@twistedfall Thanks for report! It seems that the cc v1.0.81 wasn't able to spawn compilation task fast enough, it could be blocked waiting for jobserver token to be released by the wait thread. |
@twistedfall I've opened #849 which should fix (or at least reduce) the performance regression. |
Closing this as it should have been fixed. |
Starting version 1.0.80
cc
is hanging when buildingopencv
crate. The issue is described here twistedfall/opencv-rust#480, but I will provide key details here.Bug is known to be reproduced on macOS and Linux. When building opencv
cc
crate doesn't return control and build script hangs:The issue is happens on 1.0.80 and 1.0.81. Although there are some evidences that 1.0.80 works (see twistedfall/opencv-rust#480 (comment))
The text was updated successfully, but these errors were encountered: