-
Notifications
You must be signed in to change notification settings - Fork 59
Issues: vllm-project/llm-compressor
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
CUDA OOM while saving compressed Llama-3.1-70b with AutoModelForCausalLM
bug
Something isn't working
#928
opened Nov 20, 2024 by
hibukipanim
Got Error when I load a 2of4 model using vllm.
bug
Something isn't working
#926
opened Nov 19, 2024 by
jiangjiadi
Encounter error "No modifier of type 'SparseGPTModifier' found" when upgrading to 0.3.0
bug
Something isn't working
#925
opened Nov 19, 2024 by
jiangjiadi
Discuss the use of hyperparameters in the quantization_w8a8_int8 script
documentation
Improvements or additions to documentation
#916
opened Nov 14, 2024 by
HelloCard
Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs
bug
Something isn't working
#911
opened Nov 13, 2024 by
arunpatala
OOM, deepseek v2 code lite on A40 gpus
bug
Something isn't working
#885
opened Nov 1, 2024 by
tohnee
Model saving fails on AWS instances with OOM kill
bug
Something isn't working
#868
opened Oct 25, 2024 by
Arseny-N
Output of Compressor unable to be to be loaded by latest HF Transformers
bug
Something isn't working
#865
opened Oct 23, 2024 by
hyaticua
Does llm-compressor support minicpm3 which is MLA architecture?
enhancement
New feature or request
#860
opened Oct 22, 2024 by
piamo
Is it possible to quantize to FP8 W8A16 without calibration data
enhancement
New feature or request
#858
opened Oct 21, 2024 by
us58
Perplexity (ppl) Calculation of Local Sparse Model: NaN issue
bug
Something isn't working
#853
opened Oct 19, 2024 by
HengJayWang
Why is the speed does not increase after compressed it?
bug
Something isn't working
#852
opened Oct 18, 2024 by
liho00
[Question]Does Minicpmv2.6 currently support int8/fp8 quantization?
#848
opened Oct 15, 2024 by
wjj19950828
AttributeError: 'CompressedLinear' object has no attribute 'weight'
bug
Something isn't working
#835
opened Oct 9, 2024 by
kylesayrs
When to support multi-nodes quantization?
enhancement
New feature or request
#831
opened Oct 9, 2024 by
IEI-mjx
AttributeError: 'MllamaConfig' object has no attribute 'use_cache'
bug
Something isn't working
#688
opened Sep 26, 2024 by
mgoin
SmoothQuant doesn't respect ignored modules for VLMs
bug
Something isn't working
#687
opened Sep 26, 2024 by
mgoin
KV Cache Quantization example cause problem
bug
Something isn't working
#660
opened Sep 25, 2024 by
weicheng59
[USAGE] FP8 W8A8 (+KV) with LORA Adapters
enhancement
New feature or request
#164
opened Sep 11, 2024 by
paulliwog
Previous Next
ProTip!
Updated in the last three days: updated:>2024-11-20.