-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Fail to use beamsearch with llm.chat
bug
Something isn't working
#12183
opened Jan 18, 2025 by
gystar
1 task done
[Feature]: Multi-Token Prediction (MTP)
feature request
#12181
opened Jan 18, 2025 by
casper-hansen
1 task done
[Bug]: Multi-Node Online Inference on TPUs Failing
bug
Something isn't working
#12179
opened Jan 17, 2025 by
BabyChouSr
1 task done
[Bug]: AMD GPU docker image build No matching distribution found for torch==2.6.0.dev20241113+rocm6.2
bug
Something isn't working
#12178
opened Jan 17, 2025 by
samos123
1 task done
[Bug]: Slow huggingface weights download. Sequential download
bug
Something isn't working
#12177
opened Jan 17, 2025 by
NikolaBorisov
1 task done
[RFC]: Distribute LoRA adapters across deployment
RFC
#12174
opened Jan 17, 2025 by
joerunde
1 task done
[Feature]: Serve /metrics while a model is loading
feature request
#12173
opened Jan 17, 2025 by
xfalcox
1 task done
[Bug]: Issue running the Granite-7b GGUF quantized model on multiple GPUs with vLLM due to a tensor size mismatch.
bug
Something isn't working
#12170
opened Jan 17, 2025 by
tarukumar
1 task done
[New Model]: openbmb/MiniCPM-o-2_6
new model
Requests to new models
#12162
opened Jan 17, 2025 by
myoss
1 task done
[Usage]: Terminates without any error 30 seconds after a successful run.
usage
How to use vllm
#12160
opened Jan 17, 2025 by
hznnnnnn
1 task done
[Feature]: Any plan to support key features of nanoflow?
feature request
#12157
opened Jan 17, 2025 by
dwq370
1 task done
[Bug]: After updating VLLM from 0.4.0.post1 to 0.6.4, the model loading time increased by one minute.
bug
Something isn't working
#12155
opened Jan 17, 2025 by
123qwe-ux
1 task done
[New Model]: jinaai/jina-embeddings-v3
new model
Requests to new models
#12154
opened Jan 17, 2025 by
TC10127
1 task done
[Performance]: Very low generation throughput on CPU
performance
Performance-related issues
#12153
opened Jan 17, 2025 by
SLIBM
1 task done
[Doc]: guided decoding is not compatible with speculative decoding, but "Compatibility Matrix" shows compatible
documentation
Improvements or additions to documentation
#12148
opened Jan 17, 2025 by
mpjlu
1 task done
Semantic recognition or semantic classification.
feature request
#12147
opened Jan 17, 2025 by
20246688
1 task done
[Usage]: vllm context length handling method
usage
How to use vllm
#12146
opened Jan 17, 2025 by
whoo9112
1 task done
[Bug]: High and unstable CPU usage when deployed on GPU
bug
Something isn't working
#12142
opened Jan 17, 2025 by
yh-yao
1 task done
[New Model]: NV-Embed-v2
new model
Requests to new models
#12137
opened Jan 17, 2025 by
Hypothesis-Z
1 task done
[Bug]: Close feature gaps when using xgrammar for structured output
bug
Something isn't working
structured-output
#12131
opened Jan 16, 2025 by
russellb
5 tasks
[Feature]: Support OpenAI speech-to-text interface Extra attention is needed
v1/audio/[transcriptions,translations]
feature request
help wanted
#12130
opened Jan 16, 2025 by
mgoin
1 task done
[Feature]: Any plan to support online speculative decoding ? i.e., periodically update the draft model weights bases on input queries
feature request
#12125
opened Jan 16, 2025 by
Neo9061
1 task done
[Bug]: Phi-3-small-8k cannot be served for vllm >= 0.6.5
bug
Something isn't working
#12124
opened Jan 16, 2025 by
JGSweets
1 task done
[Bug]: XGrammar-based CFG decoding degraded after 0.6.5
bug
Something isn't working
structured-output
#12122
opened Jan 16, 2025 by
AlbertoCastelo
1 task done
[Misc]: Excluding
transformers
from the dependencies
misc
#12114
opened Jan 16, 2025 by
ir2718
1 task done
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.