add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 #9022

chentyjpm · 2024-08-14T06:45:54Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

compilade · 2024-08-14T15:29:33Z

The main reason I'm hesitant to add this is that llama-quantize uses Q4_K and Q6_K for the token embeddings when quantizing to Q4_0, Q4_1, Q5_0, or Q5_1, and so unlike --outtype q8_0, this is not equivalent to using llama-quantize.

Although I did make an exception for this in #8151 for TQ1_0 and TQ2_0.

Maybe a temporary workaround could be a clear warning in the help text of --outtype, and/or at the end of conversion with these types.

ggerganov · 2024-08-15T06:20:27Z

Yes, it will cause confusion having different mixtures called the same way. Better to not add this functionality in the python scripts

chentyjpm · 2024-08-15T06:27:25Z

The main reason I'm hesitant to add this is that llama-quantize uses Q4_K and Q6_K for the token embeddings when quantizing to Q4_0, Q4_1, Q5_0, or Q5_1, and so unlike --outtype q8_0, this is not equivalent to using llama-quantize.

Although I did make an exception for this in #8151 for TQ1_0 and TQ2_0.

Maybe a temporary workaround could be a clear warning in the help text of --outtype, and/or at the end of conversion with these types.

thanks for review !

I read for cpp code in "static void llama_model_quantize_internal" function . er... but i did not find the place where token embeddings is insterted to quantized model.
can I make python code same as llama-quantize by to add Q4_K and Q6_K for the token embeddings for converted model

add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1

57b79fd

github-actions bot added the python python script changes label Aug 14, 2024

ngxson requested a review from compilade August 14, 2024 08:17

add warning code when quantizing to Q4_0, Q4_1, Q5_0, or Q5_1

7d261a9

Merge branch 'ggerganov:master' into master

4adb77f

compilade mentioned this pull request Oct 31, 2024

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 #9022

add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 #9022

chentyjpm commented Aug 14, 2024

compilade commented Aug 14, 2024

ggerganov commented Aug 15, 2024

chentyjpm commented Aug 15, 2024

add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 #9022

Are you sure you want to change the base?

add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 #9022

Conversation

chentyjpm commented Aug 14, 2024

compilade commented Aug 14, 2024

ggerganov commented Aug 15, 2024

chentyjpm commented Aug 15, 2024