gguf-py : fix some metadata name extraction edge cases #8591

compilade · 2024-07-19T16:47:10Z

Should fix the problem mentioned in #8579 (comment) by @maziyarpanahi, which was caused by #7499 not pruning the empty parts of a name.

I've also made convert_lora_to_gguf.py use the LoRA adapter directory for the model card path.

TODO

Test conversion with a LoRA adapter with convert_lora_to_gguf.py and check if the metadata is correct.

I have read the contributing guidelines
Self-reported review complexity:
- Low

* convert_lora : use the lora dir for the model card path

Multiple finetune versions are now joined together, and the removal of the basename annotation on trailing versions is more robust.

gguf-py/gguf/metadata.py

compilade · 2024-07-20T04:12:28Z

@mofosyne Regarding the use of title case for the finetune part when generating the "standard" name,

llama.cpp/gguf-py/gguf/utility.py

Line 61 in 87e397d

    
           finetune = f"-{finetune_string.strip().title().replace(' ', '-')}" if finetune_string is not None else ""

I think it looks a bit weird for a few models I've recently tried converting:

Gemma-2-9B-It-F16.gguf (seems from Italy instead of instruct-tuned)
- with original case for finetune: Gemma-2-9B-it-F16.gguf
- with original case for the basename too: gemma-2-9B-it-F16.gguf
Gemma-2B-Zephyr-Dpo-F16.gguf (dpo should not be capitalized)
- with original case for finetune: Gemma-2B-zephyr-dpo-F16.gguf
- with orignial case for the basename too: gemma-2B-zephyr-dpo-F16.gguf

Should the original case be preserved for finetune?
What about the basename part?

Do you think I should change this in this PR or another one?

(out of scope, but still want to comment about this)

Also note that the naming convention at https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention should be updated to reflect the format used in #7499 (especially the finetune part which is missing in the spec). If the word-only size labels are too complicated for the spec, they could still be detected when converting, but replaced with the calculated size label instead.

Also I don't think we're close to having any model with Quadrillions of parameter so I suggest removing this, especially since there is no example of what suffix is normally used for that case. (might be P for peta- as in the SI prefixes)

Let me know what you think about this.

mofosyne · 2024-07-20T05:35:59Z

@compilade well if it's an edge case then it's part of the PR, depending on how urget this fix needs to get in. Is it breaking anyone's workflow currently, if not then we can take our time making it better.

I don't think there is any one right answer here... but we can assume anything read to the kv store is of the correct capitalisation... as for anything heuristically obtained... it's the wild west.

mofosyne · 2024-07-20T05:39:42Z

@compilade and yes the gguf doc is outdated and from what I understand the consensus is that code is the main spec, so our implementation supersedes the document. We should update it.

Also I don't think the heuristics approach we have is a spec, it's a nice to have feature in the converter python. But the gguf spec for naming should contain only what we are aiming for people to stick to.

As for the Quadrillion, I'll suggest keeping it in... I would generally like to have +1 ahead of whatever we can immediately conceptionalise in the near future. And the near future is likely Trillion so +1 would be quadrillion. As per name of large number (wikipedia)

compilade · 2024-07-20T05:45:39Z

And the near future is likely Trillion so +1 would be quadrillion.

@mofosyne

I think a quadrillion can safely be 1000T instead.

Personally I think we're quite far from making models with 10 times more parameters than the number of neurons in a human brain.

Also I don't think the heuristics approach we have is a spec, it's a nice to have feature in the converter python. But the gguf spec for naming should contain only what we are aiming for people to stick to.

Totally agree. The heuristics are there to turn non-compliant but commonly used names into compliant names. The spec only needs to deal with the resulting names, not the mess they come from.

mofosyne · 2024-07-20T06:12:45Z

@compilade

@mofosyne

I think a quadrillion can safely be 1000T instead.

Personally I think we're quite far from making models with 10 times more parameters than the number of neurons in a human brain.

Well people in the past thought that 640K is ought to be enough for anyone and I like to be optimistic for the development of humanity that we can create something like that in the future. Not going to fight to hard to keep it, but I'll rather it stay.

Also I don't think the heuristics approach we have is a spec, it's a nice to have feature in the converter python. But the gguf spec for naming should contain only what we are aiming for people to stick to.

Totally agree. The heuristics are there to turn non-compliant but commonly used names into compliant names. The spec only needs to deal with the resulting names, not the mess they come from.

Glad we are on the same page... we just gotta remember to do it...

So... will merge soon if no objection to this PR pops up

mofosyne · 2024-07-20T07:34:38Z

Oh noticed there is this todo

Test conversion with a LoRA adapter with convert_lora_to_gguf.py and check if the metadata is correct.

Not familiar with this so will hold off and let @compilade handle it

The default filename was previously hardcoded. * convert_hf : Model.fname_out can no longer be None

Some models use acronyms in lowercase, which can't be title-cased like other words, so it's best to simply use the same case as in the original model name. Note that the size label still has an uppercased suffix to make it distinguishable from the context size of a finetune.

compilade · 2024-07-21T01:50:19Z

I've tested LoRA generation, and it seems fine.

Here's some gguf-dump.py output for https://huggingface.co/grimjim/Llama-3-Instruct-abliteration-LoRA-8B:

INFO:gguf-dump:* Loading: /srv/LLMstash/tmp/metadata-names/Llama-3-8B-Instruct-abliteration-F16-LoRA.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 42 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 450
      3: UINT64     |        1 | GGUF.kv_count = 39
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.type = 'adapter'
      6: STRING     |        1 | adapter.type = 'lora'
      7: STRING     |        1 | general.name = 'Llama 3 Instruct Abliteration LoRA 8B'
      8: STRING     |        1 | general.finetune = 'Instruct-abliteration'
      9: STRING     |        1 | general.basename = 'Llama-3'
     10: STRING     |        1 | general.size_label = '8B'
     11: STRING     |        1 | general.license = 'llama3'
     12: UINT32     |        1 | general.base_model.count = 2
     13: STRING     |        1 | general.base_model.0.name = 'Meta Llama 3 8B Instruct'
     14: STRING     |        1 | general.base_model.0.organization = 'Meta Llama'
     15: STRING     |        1 | general.base_model.0.repo_url = 'https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct'
     16: STRING     |        1 | general.base_model.1.name = 'Meta Llama 3 8B Instruct Abliterated v3'
     17: STRING     |        1 | general.base_model.1.version = 'v3'
     18: STRING     |        1 | general.base_model.1.organization = 'Failspy'
     19: STRING     |        1 | general.base_model.1.repo_url = 'https://huggingface.co/failspy/Meta-Llama-3-8B-Instruct-abli'
     20: [STRING]   |        2 | general.tags
     21: FLOAT32    |        1 | adapter.lora.alpha = 32.0
...(then follows hparams and the tokenizer)...

There's still some title-casing of the metadata which I didn't remove, but at least it's easy to see that the metadata from the LoRA adapter's model card was correctly used to link to the base models (even if the complete links are truncated in the dump output).

I don't think the hparams and the tokenizer are really used in GGUF LoRA adapters though, but that's out of the scope of this PR.

I consider this merge ready, and will merge after #8597.

compilade · 2024-07-27T02:37:48Z

@mofosyne, I recently tried converting https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct, and got surprised that the default name chosen for --outtype q8_0 is cosmo2-135M-webinst-sc2-Q8_0.gguf.

I think maybe _name_or_path from config.json should not be used for the name extraction. The whole SmolLM suite seems to use weird _name_or_path values, especially the non-Instruct variants.

https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct:

  "_name_or_path": "HuggingFaceTB/cosmo2-135M-webinst-sc2",

https://huggingface.co/HuggingFaceTB/SmolLM-135M:

  "_name_or_path": "/fsx/elie_bakouch/checkpoints/final-149M/600000",

mofosyne · 2024-07-27T12:51:47Z

@compilade urgh, perhaps we should remove _name_or_path from the heuristics.

edit: Also I see people in https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/761#669e1119f073a1a9276f7f90 discussing pulling metadata.json... not sure if I'm happy with that as I thought that was suppose to be the model card's job or the gguf kv store. But this may be an argument for us 'defining' the model card standard in terms of parsing from it... as the huggingface current approach is pretty barebone. This would then encourage model card writers to start filling in their model card with more metadata perhaps?

(On the other hand... there can be an argument for a metadata.json file... that I've haven't thought of yet)

* gguf-py : fix some metadata name extraction edge cases * convert_lora : use the lora dir for the model card path * gguf-py : more metadata edge cases fixes Multiple finetune versions are now joined together, and the removal of the basename annotation on trailing versions is more robust. * gguf-py : add more name metadata extraction tests * convert_lora : fix default filename The default filename was previously hardcoded. * convert_hf : Model.fname_out can no longer be None * gguf-py : do not use title case for naming convention Some models use acronyms in lowercase, which can't be title-cased like other words, so it's best to simply use the same case as in the original model name. Note that the size label still has an uppercased suffix to make it distinguishable from the context size of a finetune.

compilade · 2024-07-27T16:54:43Z

But this may be an argument for us 'defining' the model card standard in terms of parsing from it... as the huggingface current approach is pretty barebone. This would then encourage model card writers to start filling in their model card with more metadata perhaps?

@mofosyne

Yes, using the model card metadata for overrides seems better than a metadata.json tied to the internal GGUF metadata key names, especially if the metadata is useful for more than only llama.cpp-related projects.

But there are 2 model cards to consider: the original model card, and the model card of the converted model, which I guess could be taken from the directory where the GGUF model is exported, but some people use a single directory for all their conversions, which might be problematic for fields like the license.

Something else which could be useful is to offer an option to generate an initial model card for GGUF conversions based on the original model card with some extra metadata deduced from the model and/or update the metadata of an existing model card.
There are downsides with generating a model card, however, like users not always modifying it afterwards.

gguf-py : fix some metadata name extraction edge cases

2164c9d

* convert_lora : use the lora dir for the model card path

compilade requested a review from mofosyne July 19, 2024 16:47

github-actions bot added the python python script changes label Jul 19, 2024

compilade added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Jul 19, 2024

gguf-py : more metadata edge cases fixes

912e6fa

Multiple finetune versions are now joined together, and the removal of the basename annotation on trailing versions is more robust.

mofosyne approved these changes Jul 20, 2024

View reviewed changes

mofosyne reviewed Jul 20, 2024

View reviewed changes

gguf-py/gguf/metadata.py Show resolved Hide resolved

mofosyne added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jul 20, 2024

compilade added 3 commits July 20, 2024 15:57

gguf-py : add more name metadata extraction tests

a3d154b

convert_lora : fix default filename

bf8e71b

The default filename was previously hardcoded. * convert_hf : Model.fname_out can no longer be None

mofosyne approved these changes Jul 21, 2024

View reviewed changes

compilade merged commit 328884f into master Jul 21, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf-py : fix some metadata name extraction edge cases #8591

gguf-py : fix some metadata name extraction edge cases #8591

compilade commented Jul 19, 2024 •

edited

Loading

compilade commented Jul 20, 2024

mofosyne commented Jul 20, 2024

mofosyne commented Jul 20, 2024 •

edited

Loading

compilade commented Jul 20, 2024 •

edited

Loading

mofosyne commented Jul 20, 2024 •

edited

Loading

mofosyne commented Jul 20, 2024

compilade commented Jul 21, 2024

compilade commented Jul 27, 2024

mofosyne commented Jul 27, 2024 •

edited

Loading

compilade commented Jul 27, 2024

gguf-py : fix some metadata name extraction edge cases #8591

gguf-py : fix some metadata name extraction edge cases #8591

Conversation

compilade commented Jul 19, 2024 • edited Loading

TODO

compilade commented Jul 20, 2024

mofosyne commented Jul 20, 2024

mofosyne commented Jul 20, 2024 • edited Loading

compilade commented Jul 20, 2024 • edited Loading

mofosyne commented Jul 20, 2024 • edited Loading

mofosyne commented Jul 20, 2024

compilade commented Jul 21, 2024

compilade commented Jul 27, 2024

mofosyne commented Jul 27, 2024 • edited Loading

compilade commented Jul 27, 2024

compilade commented Jul 19, 2024 •

edited

Loading

mofosyne commented Jul 20, 2024 •

edited

Loading

compilade commented Jul 20, 2024 •

edited

Loading

mofosyne commented Jul 20, 2024 •

edited

Loading

mofosyne commented Jul 27, 2024 •

edited

Loading