-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSVC: update msvc path normalization utility functions #4434
Conversation
Changes: * Replace process_path function with resolve_path and normalize_path functions in MSCommon/MSVC/Util.py. * Replace process_path invocations with normalize_path invocations. * Protect against inadvertent resolve/realpath for windows drive specifications in resolve_path and normalize_path. * Additional options for normalize_path with defaults consistent with process_path.
SCons/Tool/MSCommon/MSVC/Util.py
Outdated
_RE_DRIVESPEC = re.compile(r'^[A-Za-z][:]$', re.IGNORECASE) | ||
|
||
# windows path separators | ||
_OS_PATH_SEPS = ('\\', '/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use os.sep and os.altsep here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but it would have to guaranteed to be windows
or win32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but it would have to guaranteed to be
windows
orwin32
.
The assumption you're on win32 if using msvc is pretty baked in - and not entirely correct, since you can get VS on the Mac now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
Unfortunately, there exist "public" methods that could be called from site initialization and/or sconstruct files that could bypass the regular tool loading and make use of the internal MSVC utility functions.
Might need some form of protection for the "public" functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary. If people want to do truly silly things. We don't need to guard against that and waste our time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might just work.
The resolve_path and normalize_path functions are exercised somewhat in the MSCommon/MSVC/UtilTests.py script on linux in the test suite (and occasionally locally in WSL) which does pass on non-windows platforms.
However, testing resolve_path handling of drive specifications (i.e., "C:") is only done on windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general if we have to jump through too many hoops we've had no issues with dropping some ability to test platform stuff on other platforms. Especially given github actions, appveyor, etc.
In the "old" days, we had to run our own buildbot and depend on our own and other users volunteered build workers. I'm glad we've moved past that as it was a major time sink..
SCons/Tool/MSCommon/MSVC/Util.py
Outdated
else: | ||
# both abspath and resolve necessary to produce identical path | ||
# when (unqualified) file name is on a mapped network drive for | ||
# python 3.6 and 3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this "and" or "through" ? (would 3.12 have same now, so maybe 3.6 and above?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly as this was borne out of my own work with using a windows named mutex as the cache lock.
There is a difference between 3.6 realpath/resolve and 3.11 realpath/resolve with respect to network drives. I don't remember if I checked intermediate versions or not.
There was a change in later versions of python that was not backported to 3.6 if I remember correctly. I don't remember offhand if there is the same issue with 3.7, etc or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are subtle differences when running from a network folder with a mapped drive letter.
In the output below, the scons folder and sconstruct folder are run from a command-line window on a mapped-drive network folder (Z:\SCons\test-msvc-normalize
=> \\JCB-HYDRA\S\SCons\test-msvc-normalize
).
The evolution of the original implementation was from os.path.realpath(p)
to pathlib.Path(p).resolve()
.
It was not possible to get consistent behavior across all supported scons python versions using os.path.realpath
.
There is a difference when using os.path.realpath(p)
and os.path.realpath(os.path.abspath(p))
in terms of the returned results as the first returns a UNC path and the second returns the mapped-drive path.
os.path.abspath(p)
should probably be called regardless as the intention of the resolve_path method is to return an absolute path and the pathlib.Path(p).resolve()
method can raise an exception which would not have converted the original path to an absolute path.
Given the funky explanation above, the comment could probably be simply removed.
Output for embedded python 3.6 to 3.11:
Results from testing a `resolve_path('SConstruct')` function added to the SConstruct file.
By function:
os.path.realpath(p):
3.6: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.7: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.8: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.9: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.10: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.11: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
os.path.realpath(os.path.abspath(p)):
3.6: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.7: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.8: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.9: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.10: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.11: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
str(pathlib.Path(p).resolve(p)):
3.6: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.7: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.8: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.9: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.10: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
3.11: '\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
str(pathlib.Path(p).resolve(os.path.abspath(p))):
3.6: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.7: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.8: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.9: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.10: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
3.11: 'Z:\\SCons\\test-msvc-normalize\\SConstruct'
By version:
resolve_path:
py=sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
p='SConstruct'
os_1='Z:\\SCons\\test-msvc-normalize\\SConstruct'
os_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
pl_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
resolve_path:
py=sys.version_info(major=3, minor=7, micro=9, releaselevel='final', serial=0)
p='SConstruct'
os_1='Z:\\SCons\\test-msvc-normalize\\SConstruct'
os_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
pl_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
resolve_path:
py=sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
p='SConstruct'
os_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
os_2='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
resolve_path:
py=sys.version_info(major=3, minor=9, micro=10, releaselevel='final', serial=0)
p='SConstruct'
os_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
os_2='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
resolve_path:
py=sys.version_info(major=3, minor=10, micro=11, releaselevel='final', serial=0)
p='SConstruct'
os_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
os_2='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
resolve_path:
py=sys.version_info(major=3, minor=11, micro=0, releaselevel='alpha', serial=5)
p='SConstruct'
os_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
os_2='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_1='\\\\JCB-HYDRA\\S\\SCons\\test-msvc-normalize\\sconstruct'
pl_2='Z:\\SCons\\test-msvc-normalize\\SConstruct'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you've got part of the answer to the research: 3.6 and 3.7 had the differences. The Python docs admit to something here, which doesn't sound like the same thing, as junctions can't cross to a network drive.
Changed in version 3.8: Symbolic links and junctions are now resolved on Windows.
Then, of course, there's this claim:
Operating system APIs make paths canonical as needed, so it’s not normally necessary to call this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwichmann You might be amused by the backstory of the results above.
The cache locking used in my own work employs a windows API named mutex. The name of the mutex cannot contain backslash characters. The "real path" to the cache file was normalized, converted to a hex string from the hash digest, and used as the name of the mutex.
Any python process using the cache basically would create the same mutex name if the cache file was in the same file system location. Real path is required in case paths contain symlinks, junction, etc.
For testing the cache locking with a named mutex, four versions of python were run simultaneously: 3.6 [32], 3.6 [64], 3.11 [32], and 3.11 [64]. Cache corruption was persistent. It took a non-trivial amount of time to nail it down to the difference in os.path.realpath
across python versions.
This particular problem is avoided using a separate lock file as any number of path specifications can refer to the same system location without a need to be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any objections to:
# both abspath and resolve necessary for an unqualified file name
# on a mapped network drive in order to return a mapped drive letter
# path rather than a UNC path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. (comment above).
Any reason we shouldn't use your mutex solution?
It's ok to not support it if not on windows, as in that case it would be some contrived cross platform testing and not an actual build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we shouldn't use your mutex solution?
It really depends on the expected level of process contention and the scons tolerance for complexity.
It also depends on if there will ever be a desire to replace the current cache "read-once, write many times" to "update (read-write)" which likely requires holding a lock/mutex longer thus increasing contention when running multiple processes.
From an implementation standpoint, the current file lock is simpler, pure python, and platform independent. The drawbacks are that as contention increases polling can be inefficient and process waiting is not "fair".
A mutex is likely more efficient and fairer as contention increases. The drawbacks are that it is more complex due to having to use ctypes, the windows api itself, an additional "limbo" state, and as described above, a challenge to come up with a name that uniquely identifies a file system location.
That said, while there is more implementation/infrastructure baggage to use the windows api directly, some system-level things are just easier to accomplish via the windows api and can be more portable across python versions.
The mutex wrapper itself is approximately 300 lines of code including a "FakeMutex" class for non-windows platforms. While the python wrapper is fairly small, there exists a non-trivial amount of library code supporting the implementation.
If you are talking about reimplementing a new version in scons, then there would be some additional infrastructure work required to get it up and running. Basically, re-implement the minimum to get it working.
If you are talking about using the existing mutex code directly then there could be some work required to extract certain modules and make it work inside the scons ecosystem.
It would certainly be interesting to benchmark the performance differences under varying levels of contention between using the existing file lock versus windows api kernel primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Sounds like it's not trivial. So let's stick a pin in this idea and if we find too many "in the wild" or in our own CI, then revisit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Happy to revisit or add implementation at any time.
Changes: * Adjust os path separators in MSCommon/MSVC/Util.py * Revise commend inside resolve_path method in in MSCommon/MSVC/Util.py * Move Util import inside verify method in MSCommon/MSVC/Config.py (prevent import dependency loops).
…SVC/UtilTests.py.
@mwichmann One of the Python 3.11 tests failed: I don't believe this is related to my changes. If you think it is related to my changes, let me know.
|
Yeah, don't worry about that one unless it actually proves reproducible. It's been known to crop up intermittently, and it's usually pretty rare. |
I think this PR is good to go? @jcbrill - Good to go from your side? |
Good to go from me. |
Replace the process_path utility function with the normalize_path utility function. The normalize_path method accepts more configurable options and protects against attempting to resolve (realpath) windows drive specifications (unless desired).
Changes:
The fourth in a sequence of smaller PRs from #4409.
Internal changes only: no documentation updates.
Contributor Checklist:
CHANGES.txt
(and read theREADME.rst
)