Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add dynamic suppor for AutoRound quantiztion #329

Open
Qubitium opened this issue Aug 2, 2024 · 3 comments
Open

[FEATURE] Add dynamic suppor for AutoRound quantiztion #329

Qubitium opened this issue Aug 2, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Qubitium
Copy link
Collaborator

Qubitium commented Aug 2, 2024

@wenhuach21 GPTQModel has merged dynamic per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is something AutoRound also wants. Is there anyway we can work together to standardize the data-interface to transfer the dynamic info to auto-round? Since this feature is new, I am open to changing the protocol within gptqmodel itself if autoround has better suggestions. Thanks.

https://github.com/ModelCloud/GPTQModel/blob/main/tests/test_dynamic.py

Ref: dynamic inference port to vllm (will port to sglang after vllm merge) vllm-project/vllm#7086

Both quantizer (GPTQModel and AutoRound) and inference library (vllm, sglang) need to receive the per layer/module dynamic overrides. It would be nice if everyone can somehow agree to something close/or similar to avoid compat issues.

@Qubitium Qubitium added the bug Something isn't working label Aug 2, 2024
@wenhuach21
Copy link

AutoRound has supported this for a long time with weight_config.
Yes, we can align. The problem is that the supported bits on the CUDA side are limited, typically to 2, 4, and 8, making it difficult to achieve the same flexibility as llama.cpp.

@Qubitium
Copy link
Collaborator Author

Qubitium commented Aug 7, 2024

@wenhuach21 Thanks for the heads up. I did not know this. I know you asked us about per-layer/module diff quantization before and didn't actually realize it was already internalized within autoround. Will check out the weight_config param/protocol and see how we can normalize this.

@wenhuach21
Copy link

The quantization process should be fine, and we set the configuration for each layer individually instead of using regex, which can be refined later. Additionally, there may be bugs in the export process for certain backends in AutoRound, as we haven't checked the supported bits yet. However, I believe this should not affect the GPTQModel repository.

@Qubitium Qubitium changed the title [BUG] Add dynamic suppor for AutoRound quantiztion [FEATURE] Add dynamic suppor for AutoRound quantiztion Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants