[FEATURE] - EfficientQAT? Supposedly allows for a 123b to be 35% of the size, with 4% accuracy loss. #5

SabinStargem · 2024-08-06T16:32:10Z

Apparently it is a new method for doing quantization? Here is the reddit and Github, so that you can see whether it is worth rolling into AutoGGUF.

Quantize 123b to 35%

EfficientQAT Github

Thank you for AutoGGUF, I am looking forward to handling quantizations without being an acolyte of the command-line. :)

leafspark · 2024-08-06T19:50:58Z

Thanks for the suggestion! I've been tinkering with it, but it seems like it requires some GPUs (to perform training), which I don't currently have to test quantization.

I can't promise this in the near future, but if GGUF conversion from GPTQ is implemented (AutoGGUF is mostly focused on llama.cpp), I can take a closer look.

SabinStargem · 2024-08-06T20:25:01Z

I have an RTX 4090. Would that be able to train a 7b through EQAT, assuming GPTQ->GGUF becomes a thing?

leafspark · 2024-08-06T23:26:37Z

Just an approximation, but the fp16 is ~14GB, and the script loads it into CPU memory first, and then moves the layers over to GPU in 4bit as it trains dynamically. So it would fit on an RTX 4090, or at minimum a GPU with around 12-16GB of VRAM including gradients, activations, optimizer states, and CUDA overhead.

SabinStargem added the bug Something isn't working label Aug 6, 2024

leafspark added enhancement New feature or request and removed bug Something isn't working labels Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] - EfficientQAT? Supposedly allows for a 123b to be 35% of the size, with 4% accuracy loss. #5

[FEATURE] - EfficientQAT? Supposedly allows for a 123b to be 35% of the size, with 4% accuracy loss. #5

SabinStargem commented Aug 6, 2024

leafspark commented Aug 6, 2024

SabinStargem commented Aug 6, 2024

leafspark commented Aug 6, 2024

[FEATURE] - EfficientQAT? Supposedly allows for a 123b to be 35% of the size, with 4% accuracy loss. #5

[FEATURE] - EfficientQAT? Supposedly allows for a 123b to be 35% of the size, with 4% accuracy loss. #5

Comments

SabinStargem commented Aug 6, 2024

leafspark commented Aug 6, 2024

SabinStargem commented Aug 6, 2024

leafspark commented Aug 6, 2024