-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle json decode error in msvc cache #4402
Conversation
The 3.6 build passed. |
Looks good to me. PR polish it up and we'll get it merged. |
aaand.... the vs2019/py3.8 version failed... |
Perhaps a one-off...
|
worked on the rebuild after the PR (non-code) update... |
Okay, the latest build caught a conflict on camera:
I think this means we are getting races on the cachefile. I wonder if we need to be more polite on writes... like elsewhere in the codebase, write it to a temporary file, then rename it over if it succeeded. Not quite sure the implications - dueling changes to the contents? Or maybe try some kind of locking protocol? Hate to get too fancy, this was supposed to be a quick hack to let tests run better, now it's showing hydra behavior... |
mention @jcbrill so he's aware of this pending stuff too... |
@mwichmann Thanks. I was looking at the changes earlier today. Looks good to me. I periodically experience the file access errors. Typically happens when I forget to disable windows defender real-time scanning in a Windows VMWare virtual machine. I believe the disabled state resets to enabled when the virtual machine is rebooted. |
The previous note about a failure would seem to be related to #4268 |
@mwichmann There is definitely a race condition (and likely cache information loss) as the cache is read-once, write-multiple times and unprotected from multiple-process access. I have a "reference implementation" outside of SCons that uses a Windows named mutex to serialize access to the cache which relies on correctly "normalizing" the real path to the cache file. The repository for the code is still private. I would be happy to add you and @bdbaddog as a contributor in the short term to be able to view/review the implementation. |
that would be cool. I've got a simple file locker sitting around which might also be of use; it tries to be portable so doesn't use any specific tricks (just tries to get exclusive access to a lockfile), but I think we only care about Windows, so someting WIndows-specific might be fine. Or do we care about the Mac case? |
An invite should have been sent. Can be extraordinarily inefficient when multiple processes write many entries at the same time due to loss protection. Tested earlier with 4 python interpreters building all possible combinations (i.e., long-running and writing many entries). This is worst-case scenario as they are basically running in lock-step and possibly running the same cache entry build at the same time. Files of interest:
Edit: |
A bit of a revamp of read_script_env_cache and write_script_env_cache. The significant change is that on read, a JSDONDecodeError is detected, and the cache file removed if so - we're guessing a write race corrupted the file. If this works, will update with more "PR polish", else will withdraw. Signed-off-by: Mats Wichmann <[email protected]>
Signed-off-by: Mats Wichmann <[email protected]>
102d852
to
9681d6c
Compare
@mwichmann I'm not sure the warning messages are going to populate correctly. It appears like a mix of % formatting with f-strings. Since the same format string is used in both the warning message and the debug statement, probably should be %-formatting with the explicit population (e.g., As always, I could be wrong. |
ah, forgot it's used twice. for debug, it's logging and so you're supposed to use (template, args...). I'll take a look later, deep in something at the moment. |
Signed-off-by: Mats Wichmann <[email protected]>
Yes, I got caught between changes - did something, then changed it again and didn't clean up. Thanks for spotting. |
Following on from SCons#4402, which tries to recover from races that might either cause a corrupt msvc cache file, or an incomplete read happening while a write is still in process, add a simple-minded locking protocol to try to prevent the problem in the first place. For writing, a lockfile is created with exclusive access. For reading, the same is done but immediately released: we only wanted to wait that it's not currently locked, we don't need to keep it locked at this point. This is addressing what's mainly an issue for testing, when there can be many concurrent SCons instance each running a test - we don't think this is likely in normal developer usage. Signed-off-by: Mats Wichmann <[email protected]>
A bit of a revamp of
read_script_env_cache
andwrite_script_env_cache
. The significant change is that on read, aJSDONDecodeError
is detected, and the cache file removed if so - we're guessing a write race corrupted the file.If this works, will update with more "PR polish", else will withdraw. The "guess" is because only the AppVeyor build on Py 3.6/VS 2017 hits this, haven't reproduced in a developer context yet.
There's no external visibility, so no doc impacts.
Contributor Checklist:
CHANGES.txt
(and read theREADME.rst
)