Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 #9022

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chentyjpm
Copy link

@github-actions github-actions bot added the python python script changes label Aug 14, 2024
@ngxson ngxson requested a review from compilade August 14, 2024 08:17
@compilade
Copy link
Collaborator

The main reason I'm hesitant to add this is that llama-quantize uses Q4_K and Q6_K for the token embeddings when quantizing to Q4_0, Q4_1, Q5_0, or Q5_1, and so unlike --outtype q8_0, this is not equivalent to using llama-quantize.

Although I did make an exception for this in #8151 for TQ1_0 and TQ2_0.

Maybe a temporary workaround could be a clear warning in the help text of --outtype, and/or at the end of conversion with these types.

@ggerganov
Copy link
Owner

Yes, it will cause confusion having different mixtures called the same way. Better to not add this functionality in the python scripts

@chentyjpm
Copy link
Author

The main reason I'm hesitant to add this is that llama-quantize uses Q4_K and Q6_K for the token embeddings when quantizing to Q4_0, Q4_1, Q5_0, or Q5_1, and so unlike --outtype q8_0, this is not equivalent to using llama-quantize.

Although I did make an exception for this in #8151 for TQ1_0 and TQ2_0.

Maybe a temporary workaround could be a clear warning in the help text of --outtype, and/or at the end of conversion with these types.

thanks for review !

I read for cpp code in "static void llama_model_quantize_internal" function . er... but i did not find the place where token embeddings is insterted to quantized model.
can I make python code same as llama-quantize by to add Q4_K and Q6_K for the token embeddings for converted model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants