Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr? #139

lwdnxu · 2024-09-18T01:44:29Z

Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr?

noamgat · 2024-09-19T19:08:15Z

This has been discussed, also in the vLLM repo. There are some profiling efforts going on there, it might have to do with copying logit buffers from CPU to GPU memory. There is no clear cut solution yet, if you want to step in to investigate, we will be very grateful :)

ckhfor · 2024-12-17T11:57:53Z

python -> cpp, python is too slow...
multithreading, python gil...
pre-compute accept/reject/uncertain
... many ways to accelerate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr? #139

Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr? #139

lwdnxu commented Sep 18, 2024

noamgat commented Sep 19, 2024

ckhfor commented Dec 17, 2024

Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr? #139

Now I use lm-format-forcer+vllm and it takes 10 seconds to generate a json. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or pr? #139

Comments

lwdnxu commented Sep 18, 2024

noamgat commented Sep 19, 2024

ckhfor commented Dec 17, 2024