-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140
Conversation
Co-authored-by: Chih-Chieh-Yang <[email protected]> Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Thanks IMHO, this issue should be addressed by bundling the custom cache manager code inside the vllm. |
Signed-off-by: Thomas Parnell <[email protected]>
Thanks @tdoublep, I had mentioned this to @youkaichao previously but kept forgetting to open a PR. Not immediately obvious why this seems to only affect the I agree that it would be better for this to be incorporated into the library if possible. I wonder if we could open a PR or issue in the triton for this (if one doesn't already exist) |
Signed-off-by: Thomas Parnell <[email protected]>
triton_patch/custom_cache_manager.py
Outdated
else: | ||
raise RuntimeError("Could not create or locate cache dir") | ||
|
||
print(f"Triton cache dir: {self.cache_dir=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a debug log instead, it produces a lot of output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have just removed it for now
Signed-off-by: Thomas Parnell <[email protected]>
@njhill @jeejeelee I have re-implemented it as part of the vllm library. One thing I'm not sure about is whether setting the env variable from fused_moe code is sufficient, or whether there are other parts of the code where this fix would be needed. Maybe it's OK for now. |
Signed-off-by: Thomas Parnell <[email protected]>
Even if we don't consider #5036, prefix_prefill and triton_flash_attention are still necessary. |
def maybe_set_triton_cache_manager(module: str) -> None: | ||
cache_manger = os.environ.get("TRITON_CACHE_MANAGER", None) | ||
if cache_manger != module: | ||
os.environ["TRITON_CACHE_MANAGER"] = module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user manually sets this env, can we modify it? Additionally, I suggest adding a log message for clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have changed it so that we only set it if the user has not. also added a log message
@jeejeelee ok, in that case I guess it makes sense to call |
Signed-off-by: Thomas Parnell <[email protected]>
@njhill @jeejeelee I've moved the call to |
Signed-off-by: Thomas Parnell <[email protected]>
CI tests failures look like network blips ( |
os.environ["TRITON_CACHE_MANAGER"] = manager | ||
|
||
|
||
class CustomCacheManager(FileCacheManager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some docstrings
Signed-off-by: Thomas Parnell <[email protected]>
CI failure looks unrelated:
|
I'm also wondering this, too. cc anyscale folks @cadedaniel @Yard1 for visibility. |
"""Re-implements Triton's cache manager, ensuring that a | ||
unique cache directory is created for each process. This is | ||
needed to avoid collisions when running with tp>1 and | ||
using multi-processing as the distributed backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If triton 3.0.0 could solve this problem, it'd be better to note here that this custom cache manager can be removed when we upgrade triton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix for the issue is not yet in v3.0.0, but I guess would be in whatever version comes after that (see my summary here). I will add a comment to that end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Thomas Parnell <[email protected]>
All comments have been addressed. Is there anything else you would like to see? @comaniac @njhill @jeejeelee @simon-mo I think it would be good to get this one in since there are quite a few people struggling with this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @Yard1 to take a final pass.
merging to unblock release |
…m-project#6140) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh-Yang <[email protected]>
…m-project#6140) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh-Yang <[email protected]>
…m-project#6140) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh-Yang <[email protected]> Signed-off-by: Alvant <[email protected]>
Fixes #6103
We have been using this fix via our fork (see here) for a while and it seems stable.
Note, this will only resolve the problem if you are using vLLM from the docker image. Maybe a better approach would be to bundle the custom cache manager code inside vllm package, that way it will get shipped via pip install too, and the user could still set env variable to enable it.Update: I've now implemented it by including the custom cache manager inside vLLM and setting the necessary env variable via code.
cc @jeejeelee