Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] - EfficientQAT? Supposedly allows for a 123b to be 35% of the size, with 4% accuracy loss. #5

Open
SabinStargem opened this issue Aug 6, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@SabinStargem
Copy link

Apparently it is a new method for doing quantization? Here is the reddit and Github, so that you can see whether it is worth rolling into AutoGGUF.

Quantize 123b to 35%

EfficientQAT Github

Thank you for AutoGGUF, I am looking forward to handling quantizations without being an acolyte of the command-line. :)

@SabinStargem SabinStargem added the bug Something isn't working label Aug 6, 2024
@leafspark leafspark added enhancement New feature or request and removed bug Something isn't working labels Aug 6, 2024
@leafspark
Copy link
Owner

Thanks for the suggestion! I've been tinkering with it, but it seems like it requires some GPUs (to perform training), which I don't currently have to test quantization.

I can't promise this in the near future, but if GGUF conversion from GPTQ is implemented (AutoGGUF is mostly focused on llama.cpp), I can take a closer look.

@SabinStargem
Copy link
Author

I have an RTX 4090. Would that be able to train a 7b through EQAT, assuming GPTQ->GGUF becomes a thing?

@leafspark
Copy link
Owner

Just an approximation, but the fp16 is ~14GB, and the script loads it into CPU memory first, and then moves the layers over to GPU in 4bit as it trains dynamically. So it would fit on an RTX 4090, or at minimum a GPU with around 12-16GB of VRAM including gradients, activations, optimizer states, and CUDA overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants