-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FRONTEND] let CacheManager write to temp dir instead of temp file #4295
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ThomasRaoux
approved these changes
Jul 10, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me
awesome thanks @yundai424 - this looks like it should fix the issue we have in vLLM. is there a rough ETA for when this would be in a Triton release? trying to plan whether we need to merge a work-around in the meantime. |
bertmaher
pushed a commit
to bertmaher/triton
that referenced
this pull request
Dec 10, 2024
…riton-lang#4295) # Summary there've been multiple issues discussing around the `FileNotFoundError` on compilation when `CompiledKernel` is trying to read from the listed ASM files. triton-lang#2688 triton-lang#4002 vllm-project/vllm#6103 etc. and there have been some attempts to address it such as triton-lang#3544 . This PR attempts to explain the root cause and suggest a fix. # Why When a kernel is being compiled, triton first writes IRs to triton cache dir ([ref](https://github.com/triton-lang/triton/blob/78091647fccb6825ed9956ff7c0300859856d261/python/triton/compiler/compiler.py#L289)). Inside of the write operation, the process first writes it to a temp file unique to the current process (plus a uuid to distinguish between multiple processes with same PID on different hosts sharing the same underlying FS) ([ref](https://github.com/triton-lang/triton/blob/c14b033cd979d5c39e5fdb3847c022fa5d71a0c1/python/triton/runtime/cache.py#L124-L130)) and then atomically `os.replace` it to the final file name. Afterwards the `CompiledKernel` lists all the IRs and reads them ([ref](https://github.com/triton-lang/triton/blob/78091647fccb6825ed9956ff7c0300859856d261/python/triton/compiler/compiler.py#L362-L367)). On multiprocess set up this may however result in a race condition. Let's focus on a case where there's one host with 2 processes on it. ![Triton RC (1)](https://github.com/triton-lang/triton/assets/43726198/ffc20e0c-0404-4e7a-bd6c-022e710e97b9) At the time when `pid 1` lists ASMs, the dir may contain temp files generated from another process `pid 2`. However at the time when `pid 1` proceeds to read bytes from the listed files, `pid2` may have already `os.replace`ed its temp files, so `pid 1` will encounter `FileNotFoundError` when trying to read the temp file generated by `pid 2`. IBM/vllm#35 (comment) also believes this is the root cause. # How There're multiple potential solutions towards this, as mentioned in IBM/vllm#35 (comment) as well: - let each process write to a private temp dir instead so `glob` won't bother taking the temp stuff into consideration - or, exclude `tmp.pid_*` from `glob` This PR tries to go with the 1st approach to avoid adding an assumption on the tmp file pattern (which is only used in `runtime/cache.py`) in `compiler/compiler.py` but is open to any suggestion. Thanks! Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `not applicable`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
there've been multiple issues discussing around the
FileNotFoundError
on compilation whenCompiledKernel
is trying to read from the listed ASM files. #2688 #4002 vllm-project/vllm#6103 etc. and there have been some attempts to address it such as #3544 . This PR attempts to explain the root cause and suggest a fix.Why
When a kernel is being compiled, triton first writes IRs to triton cache dir (ref). Inside of the write operation, the process first writes it to a temp file unique to the current process (plus a uuid to distinguish between multiple processes with same PID on different hosts sharing the same underlying FS) (ref) and then atomically
os.replace
it to the final file name. Afterwards theCompiledKernel
lists all the IRs and reads them (ref).On multiprocess set up this may however result in a race condition. Let's focus on a case where there's one host with 2 processes on it.
At the time when
pid 1
lists ASMs, the dir may contain temp files generated from another processpid 2
. However at the time whenpid 1
proceeds to read bytes from the listed files,pid2
may have alreadyos.replace
ed its temp files, sopid 1
will encounterFileNotFoundError
when trying to read the temp file generated bypid 2
. IBM/vllm#35 (comment) also believes this is the root cause.How
There're multiple potential solutions towards this, as mentioned in IBM/vllm#35 (comment) as well:
glob
won't bother taking the temp stuff into considerationtmp.pid_*
fromglob
This PR tries to go with the 1st approach to avoid adding an assumption on the tmp file pattern (which is only used in
runtime/cache.py
) incompiler/compiler.py
but is open to any suggestion. Thanks!Complete the following tasks before sending your PR, and replace
[ ]
with[x]
to indicate you have done them.I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsnot applicable
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)