You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr?
#139
Open
lwdnxu opened this issue
Sep 18, 2024
· 2 comments
Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr?
The text was updated successfully, but these errors were encountered:
This has been discussed, also in the vLLM repo. There are some profiling efforts going on there, it might have to do with copying logit buffers from CPU to GPU memory. There is no clear cut solution yet, if you want to step in to investigate, we will be very grateful :)
Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr?
The text was updated successfully, but these errors were encountered: