You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@wenhuach21 GPTQModel has merged dynamic per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is something AutoRound also wants. Is there anyway we can work together to standardize the data-interface to transfer the dynamic info to auto-round? Since this feature is new, I am open to changing the protocol within gptqmodel itself if autoround has better suggestions. Thanks.
Ref: dynamic inference port to vllm (will port to sglang after vllm merge) vllm-project/vllm#7086
Both quantizer (GPTQModel and AutoRound) and inference library (vllm, sglang) need to receive the per layer/module dynamic overrides. It would be nice if everyone can somehow agree to something close/or similar to avoid compat issues.
The text was updated successfully, but these errors were encountered:
AutoRound has supported this for a long time with weight_config.
Yes, we can align. The problem is that the supported bits on the CUDA side are limited, typically to 2, 4, and 8, making it difficult to achieve the same flexibility as llama.cpp.
@wenhuach21 Thanks for the heads up. I did not know this. I know you asked us about per-layer/module diff quantization before and didn't actually realize it was already internalized within autoround. Will check out the weight_config param/protocol and see how we can normalize this.
The quantization process should be fine, and we set the configuration for each layer individually instead of using regex, which can be refined later. Additionally, there may be bugs in the export process for certain backends in AutoRound, as we haven't checked the supported bits yet. However, I believe this should not affect the GPTQModel repository.
Qubitium
changed the title
[BUG] Add dynamic suppor for AutoRound quantiztion
[FEATURE] Add dynamic suppor for AutoRound quantiztion
Oct 28, 2024
@wenhuach21 GPTQModel has merged
dynamic
per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is something AutoRound also wants. Is there anyway we can work together to standardize the data-interface to transfer thedynamic
info to auto-round? Since this feature is new, I am open to changing the protocol within gptqmodel itself if autoround has better suggestions. Thanks.https://github.com/ModelCloud/GPTQModel/blob/main/tests/test_dynamic.py
Ref:
dynamic
inference port to vllm (will port to sglang after vllm merge) vllm-project/vllm#7086Both quantizer (GPTQModel and AutoRound) and inference library (vllm, sglang) need to receive the per layer/module
dynamic
overrides. It would be nice if everyone can somehow agree to something close/or similar to avoid compat issues.The text was updated successfully, but these errors were encountered: