-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Getting GPU memory usage by a worker process correctly. #2807
base: main
Are you sure you want to change the base?
Conversation
FYI @pseudotensor |
76f73bd
to
ce4b03a
Compare
It's related to gpuopenanalytics/pynvml#36 So if vllm running inside docker we can't rely on pid. |
Thanks! |
@pseudotensor we will have to use |
1dcecb2
to
11e3db5
Compare
I believe that this PR is still valuable even after #2863 as it measures memory usage by worker's process and is not affected by race when GPU memory gets allocated and freed by anyone. |
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
This pull request has merge conflicts that must be resolved before it can be |
Currently, it's not possible to properly share GPU resources by multiple running vllm instances.
If I share GPU memory 50/50 the second process will fail.
I'm properly getting memory consumption for every worker process.