[FEATURE] Add `dynamic` suppor for AutoRound quantiztion #329

Qubitium · 2024-08-02T18:28:58Z

@wenhuach21 GPTQModel has merged dynamic per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is something AutoRound also wants. Is there anyway we can work together to standardize the data-interface to transfer the dynamic info to auto-round? Since this feature is new, I am open to changing the protocol within gptqmodel itself if autoround has better suggestions. Thanks.

https://github.com/ModelCloud/GPTQModel/blob/main/tests/test_dynamic.py

Ref: dynamic inference port to vllm (will port to sglang after vllm merge) vllm-project/vllm#7086

Both quantizer (GPTQModel and AutoRound) and inference library (vllm, sglang) need to receive the per layer/module dynamic overrides. It would be nice if everyone can somehow agree to something close/or similar to avoid compat issues.

The text was updated successfully, but these errors were encountered:

wenhuach21 · 2024-08-07T01:30:26Z

AutoRound has supported this for a long time with weight_config.
Yes, we can align. The problem is that the supported bits on the CUDA side are limited, typically to 2, 4, and 8, making it difficult to achieve the same flexibility as llama.cpp.

Qubitium · 2024-08-07T17:35:54Z

@wenhuach21 Thanks for the heads up. I did not know this. I know you asked us about per-layer/module diff quantization before and didn't actually realize it was already internalized within autoround. Will check out the weight_config param/protocol and see how we can normalize this.

wenhuach21 · 2024-08-08T01:35:20Z

The quantization process should be fine, and we set the configuration for each layer individually instead of using regex, which can be refined later. Additionally, there may be bugs in the export process for certain backends in AutoRound, as we haven't checked the supported bits yet. However, I believe this should not affect the GPTQModel repository.

Qubitium added the bug Something isn't working label Aug 2, 2024

Qubitium changed the title ~~[BUG] Add dynamic suppor for AutoRound quantiztion~~ [FEATURE] Add dynamic suppor for AutoRound quantiztion Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add `dynamic` suppor for AutoRound quantiztion #329

[FEATURE] Add `dynamic` suppor for AutoRound quantiztion #329

Qubitium commented Aug 2, 2024 •

edited

Loading

wenhuach21 commented Aug 7, 2024

Qubitium commented Aug 7, 2024

wenhuach21 commented Aug 8, 2024

[FEATURE] Add dynamic suppor for AutoRound quantiztion #329

[FEATURE] Add dynamic suppor for AutoRound quantiztion #329

Comments

Qubitium commented Aug 2, 2024 • edited Loading

wenhuach21 commented Aug 7, 2024

Qubitium commented Aug 7, 2024

wenhuach21 commented Aug 8, 2024

[FEATURE] Add `dynamic` suppor for AutoRound quantiztion #329

[FEATURE] Add `dynamic` suppor for AutoRound quantiztion #329

Qubitium commented Aug 2, 2024 •

edited

Loading