-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Nonsense responses with n-gram speculative decoding
#2997
opened Feb 6, 2025 by
olliestanley
1 of 4 tasks
Request failed during generation: Server error: Value out of range: -29146814772
#2994
opened Feb 5, 2025 by
AlperYildirim1
2 of 4 tasks
TGI with DeepSeepR1-AWQ: scoring func sigmoid is not handled Error
#2989
opened Feb 4, 2025 by
ChristophRaab
2 of 4 tasks
Mistral Small 3 : chat template with python functions causes error
#2987
opened Feb 3, 2025 by
v3ss0n
2 tasks done
Error: "new batch size should not exceed padded batch size" when running latest Docker container and sending multiple requests simultaneously
#2985
opened Feb 2, 2025 by
BradyBonnette
2 of 4 tasks
no prefill when decoder_input_details=True from InferenceClient
#2973
opened Jan 30, 2025 by
lifeng-jin
2 of 4 tasks
Incorrect Tokenization Likely Because of Diacritics in OpenChat and LLaMA 3.2 (TGI v3.0.2 and v2.2.0)
#2969
opened Jan 30, 2025 by
biba10
2 of 4 tasks
Structured output doesn't work with open ai endpoint
#2959
opened Jan 27, 2025 by
Stealthwriter
2 of 4 tasks
Running Qwen2-VL-2B-Instruct on TGI is giving an error
#2955
opened Jan 27, 2025 by
ashwani-bhat
2 of 4 tasks
CUDA Out of memory when using the benchmarking tool with batch size greater than 1
#2952
opened Jan 24, 2025 by
mborisov-bi
3 of 4 tasks
Serverless Inference API OpenAI /v1/chat/completions route broken
#2946
opened Jan 23, 2025 by
pelikhan
1 of 4 tasks
RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model
#2944
opened Jan 23, 2025 by
edesalve
2 of 4 tasks
text-generation-inference:3.0.1 docker container timeout on image fetching from fastapi static files.
#2930
opened Jan 21, 2025 by
dinoelT
2 of 4 tasks
Mangled generation for string sequences containing
<space>'m
with Llama 3.1
#2927
opened Jan 20, 2025 by
tomjorquera
1 of 4 tasks
AttributeError: no attribute 'model' when using llava-next with lora-adapters
#2926
opened Jan 20, 2025 by
derkleinejakob
2 of 4 tasks
Does tgi support image resize for qwen2-vl pipeline?
#2920
opened Jan 16, 2025 by
AHEADer
1 of 4 tasks
CUDA: an illegal memory access was encountered with Mistral FP8 Marlin kernels on NVIDIA driver 535.216.01 (AWS Sagemaker Real-time Inference)
#2915
opened Jan 15, 2025 by
dwyatte
3 of 4 tasks
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.