feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

trufae · 2024-10-22T16:53:10Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

trufae · 2024-10-31T17:36:24Z

ping

wooooyeahhhh · 2024-10-31T18:25:15Z

Wouldn't this have a negative effect on output quality compared to converting to f16 then using the quantise program? Because the output and embedding tensors would be converted to q4_0/q4_1 and I don't think that the quantize program produces a pure quant.

compilade · 2024-10-31T19:11:54Z

Related to #9022.

Basically, the q4_0 and q4_1 options of llama-quantize also use q4_k and q6_k for the token embeddings and output tensors, and those types are not yet supported by the Python re-implementation in gguf-py/gguf/quants.py, partly because it would be slow, but mostly because the k-quants rounding is not platform-independent (because of different rounding depending of whether or not FMA was used).

But for quantization types smaller than Q8_0, there's also a lot of heuristics in llama_tensor_get_type to "choose" the type of each tensor, which is more complicated than the current type selection logic of convert_hf_to_gguf.py, (which fortunately gives exactly the same selections for {F32, F16, BF16, Q8_0}, but not other types).

Ideally convert_hf_to_gguf.py should produce the exact same model files as llama-quantize (which it does for F32, F16, BF16, and Q8_0) to reduce confusion, but as explained above, it's more complicated for smaller types without changing the existing mixtures produced by llama-quantize.

Eventually, the k-quants rounding will be platform-independent and k-quantization will be implemented in gguf-py/gguf/quants.py, and then direct conversion to Q4_0, Q4_1, Q5_0, and Q5_1 could be added to convert_hf_to_gguf.py, but the type selection heuristics for smaller quants would need to be ported to the convert scripts too.

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications

a279f17

github-actions bot added the python python script changes label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

trufae commented Oct 22, 2024 •

edited

Loading

trufae commented Oct 31, 2024

wooooyeahhhh commented Oct 31, 2024

compilade commented Oct 31, 2024

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

Are you sure you want to change the base?

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

Conversation

trufae commented Oct 22, 2024 • edited Loading

trufae commented Oct 31, 2024

wooooyeahhhh commented Oct 31, 2024

compilade commented Oct 31, 2024

trufae commented Oct 22, 2024 •

edited

Loading