convert.py: Outfile default name change and additional metadata support #4858

mofosyne · 2024-01-10T14:53:01Z

Was working on llamafile and was exploring how people commonly name their files in hugging face. Based on that I am suggesting a naming convention for it that I think can also apply for GGUF, so I'm applying it to the convert.py that I am currently using.

This commit adds a --metadata which would read from a file like below which would give convert.py enough of a context to figure out a reasonable default file name.

{
    "general.name": "TinyLLama",
    "general.version": "v0",
    "general.author": "mofosyne",
    "general.url": "https://huggingface.co/mofosyne/TinyLLama-v0-llamafile",
    "general.description": "This gguf is ported from a first version of Maykeye attempt at recreating roneneldan/TinyStories-1M but using Llama architecture",
    "general.license": "apache-2.0",
    "general.source.url": "https://huggingface.co/Maykeye/TinyLLama-v0",
    "general.source.huggingface.repository": "https://huggingface.co/Maykeye/TinyLLama-v0"
}

The above when applied to a hugging face model Maykeye/TinyLLama-v0 will generate tinystories-v0-5M-F16.gguf. Key thing to note is that it is able to estimate the total parameter size (version and name is determined by metadata.json and by context).

I'm also proposing that we add an additional field general.version to the gguf standard which would be handy for models that are from the same group and is effectively the same model but trained further. People have been attaching version to model name, so it be better to allow people to split it to model name and version.

The above metadata KV store key names are based on https://github.com/ggerganov/ggml/blob/master/docs/gguf.md which appears to be the canonical reference for gguf key values names.

Proposed GGUF Naming Convention

GGUF follow a naming convention of <Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf.

The components are:

Model: A descriptive name for the model type or architecture.
Version (Optional): Denotes the model version number, starting at v1 if not specified, formatted as v<Major>.<Minor>.
- Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card.
ExpertsCount: Indicates the number of experts found in a Mixture of Experts based model.
Parameters: Indicates the number of parameters and their scale, represented as <count><scale-prefix>:
- T: Trillion parameters.
- B: Billion parameters.
- M: Million parameters.
- K: Thousand parameters.
Quantization: This part specifies how the model parameters are quantized or compressed. The notation is influenced by the ./quantize --help command in llama.cpp.
- Uncompressed formats:
  - F16: 16-bit floats per weight
  - F32: 32-bit floats per weight
- Quantization (Compression) formats:
  - Q<X>: X bits per weight, where X could be 4 (for 4 bits) or 8 (for 8 bits) etc...
  - Variants provide further details on how the quantized weights are interpreted:
    - _K: k-quant models, which further have specifiers like _S, _M, and _L for small, medium, and large, respectively, if they are not specified, it defaults to medium.
    - _<num>: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight. This convention was found from this llama.cpp issue ticket on QX_4.
      - Even Number (0 or 2): <model weights> = <scaling factor> * <quantised weight>
      - Odd Number (1 or 3): <model weights> = <offset factor> + <scaling factor> * <quantised weight>

mofosyne · 2024-01-16T04:56:23Z

Any thoughts about this proposal @cebtenzzre ? (Also the addition of an extra field general.version to gguf?)

ggerganov

Would like some extra review on the Python implementation before merging

mofosyne · 2024-04-05T23:43:59Z

no problems. Just rebased so that there is only one file to review

convert.py

mofosyne · 2024-04-06T00:41:32Z

Should I also add a command to tell the user what default file it will generate? This may be required for automated pipelines scripts so they know what .gguf artifacts was generated? If so, this is what I plan to add

    parser.add_argument("--get-outfile",      action="store_true",    help="get calculated default outfile format")
...
    if args.get_outfile:
        logging.basicConfig(level=logging.CRITICAL)
        model_plus = load_some_model(args.model)
        params = Params.load(model_plus)

        model   = model_plus.model
        model   = convert_model_names(model, params, args.skip_unknown)
        ftype   = pick_output_type(model, args.outtype)
        model   = convert_to_output_type(model, ftype)

        model_params_count = model_parameter_count(model_plus.model)
        print(f"{default_convention_outfile(model_plus.paths, ftype, params, model_params_count, metadata)}")
        return

mofosyne · 2024-04-06T01:44:55Z

Also if it would help here is https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/llamafile-creation.sh which shows me trying to convert from safetensor to gguf and using the new metadata format. Note this command:

./llama.cpp/convert.py maykeye_tinyllama --outtype f16 --metadata maykeye_tinyllama-metadata.json

The metadata used above is also in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/maykeye_tinyllama-metadata.json and looks like:

{
    "general.name": "TinyLLama",
    "general.version": "v0",
    "general.author": "mofosyne",
    "general.url": "https://huggingface.co/mofosyne/TinyLLama-v0-llamafile",
    "general.description": "This gguf is ported from a first version of Maykeye attempt at recreating roneneldan/TinyStories-1M but using Llama architecture",
    "general.license": "apache-2.0",
    "general.source_url": "https://huggingface.co/Maykeye/TinyLLama-v0",
    "general.source_hf_repo": "https://huggingface.co/Maykeye/TinyLLama-v0"
}

So the other factor you may want to also consider when reviewing, is if I included enough metadata as well.

mofosyne · 2024-04-10T13:51:39Z

Slightly realize that we may also need to adjust the other conversion scripts to match this as well. But on cursory look at the others, wasn't exactly sure where to get all the values. Plus we should likely first make sure we agree on the naming convention first at least on this file before touching the rest.

mofosyne · 2024-04-12T11:40:48Z

Also heads, up that for me to add--get-outfile I'll need someone to pull this #6511 PR so I can suppress other message when dumping the generated filename.

mofosyne · 2024-05-06T06:59:13Z

Now that the python logging was refactored, I've took the opportunity to refactor this PR to include --get-outfile so now when you use this switch you can see how it selects the default outfile name based on model name and internal specifications.

~/huggingface/TinyLLama-v0-llamafile$ ./llama.cpp/convert.py maykeye_tinyllama --outtype f16 --metadata maykeye_tinyllama-metadata.json --get-outfile
TinyLLama-v0-5M-F16

mofosyne · 2024-05-06T08:45:32Z

Okay now testing again the usage flow implication of adding --get-outfile flag in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile. This is what my bash script looks like, where I grab the generated outfile name via OUTFILE=$(./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --get-outfile)

#!/bin/bash

MODEL_DIR="maykeye_tinyllama"
METADATA_FILE="maykeye_tinyllama-metadata.json"

###############################################################################
# Pull both model folder, llamafile (for the engine) and llama.cpp (for the conversion script)
echo == Prep Enviroment ==
git submodule update --init

###############################################################################
echo == Build and prep the llamafile engine execuable ==
pushd llamafile
make -j8
make
popd

###############################################################################
echo == What is our llamafile name going to be? ==
OUTFILE=$(./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --get-outfile)
echo We will be aiming to generate $OUTFILE.llamafile

###############################################################################
echo == Convert from safetensor to gguf ==
./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16
mv ${MODEL_DIR}/${OUTFILE}.gguf ${OUTFILE}.gguf

###############################################################################
echo == Generating Llamafile ==
cp ./llamafile/o/llama.cpp/main/main ${OUTFILE}.llamafile

# Create an .args file with settings defaults
cat >.args <<EOF
-m
${OUTFILE}.gguf
EOF

# zip align engine, gguf and default args
./llamafile/o/llamafile/zipalign -j0 ${OUTFILE}.llamafile ${OUTFILE}.gguf .args

###############################################################################
echo == Test Output ==
./${OUTFILE}.llamafile --cli -p "hello world the gruff man said"

It would be good to hear other ggml packagers/maintainers to see if this workflow makes sense.

compilade

This looks good to me. Some minor things to correct, but it's overall fine :)

convert.py

mofosyne · 2024-05-13T03:44:15Z

Okay double checked in my hugging face repo that my replications script now works on the merged master and can confirm no problems.

https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile

It's now ready for use. @compilade thanks for the review, feel free to suggest improvements for metadata override to #7165 if you feel like we should adjust the behavior to make it easier to use.

compilade · 2024-05-13T03:50:38Z

@mofosyne Consider updating the example in the description of this PR to use valid metadata keys according to what was merged.

Note that general.source.url and general.source.huggingface.repository are also part of the GGUF spec (if you want somewhere else to link for the naming scheme)

mofosyne · 2024-05-13T03:57:41Z

Good point. I've updated the metadata example in both the issue ticket and this PR description.

So https://github.com/ggerganov/ggml/blob/master/docs/gguf.md is the canonical source? Gotcha. I've already merged in the changes, but I'll note it in the description at least and hopefully we can adjust the source comments later as needed.

…rt (ggerganov#4858) * convert.py: Outfile default name change and additional metadata support * convert.py: don't stringify Metadata load method output * convert.py: typo fix * convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp

ggerganov requested a review from cebtenzzre January 11, 2024 21:26

mofosyne force-pushed the outfile-default-name-change branch from 8f44129 to da064a8 Compare April 5, 2024 07:56

mofosyne mentioned this pull request Apr 5, 2024

gguf.py: add licence and version to gguf writer #6504

Merged

This comment has been minimized.

Sign in to view

mofosyne mentioned this pull request Apr 5, 2024

Update to readme and added application notes #168 Mozilla-Ocho/llamafile#178

Closed

ggerganov approved these changes Apr 5, 2024

View reviewed changes

mofosyne force-pushed the outfile-default-name-change branch from 3a10cec to 840e7bf Compare April 5, 2024 23:42

mofosyne commented Apr 6, 2024

View reviewed changes

convert.py Show resolved Hide resolved

mofosyne marked this pull request as draft May 6, 2024 03:10

mofosyne force-pushed the outfile-default-name-change branch 2 times, most recently from ddfaad9 to 4f99da4 Compare May 6, 2024 06:57

mofosyne marked this pull request as ready for review May 6, 2024 07:06

convert.py: Outfile default name change and additional metadata support

74fe2ea

mofosyne force-pushed the outfile-default-name-change branch from 4f99da4 to 74fe2ea Compare May 6, 2024 07:45

mofosyne mentioned this pull request May 9, 2024

Add metadata override and also generate dynamic default filename when converting gguf #7165

Closed

ggerganov added the need feedback Testing and feedback with results are needed label May 9, 2024

mofosyne requested review from phymbert and compilade May 10, 2024 10:32

compilade approved these changes May 12, 2024

View reviewed changes

convert.py Show resolved Hide resolved

convert.py Outdated Show resolved Hide resolved

convert.py Outdated Show resolved Hide resolved

convert.py Outdated Show resolved Hide resolved

mofosyne added 3 commits May 13, 2024 11:30

convert.py: don't stringify Metadata load method output

caf5fc3

convert.py: typo fix

4bd6227

convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp

005c1b4

mofosyne merged commit b1f8af1 into ggerganov:master May 13, 2024
22 checks passed

mofosyne deleted the outfile-default-name-change branch May 13, 2024 02:56

This was referenced May 13, 2024

Proposing To Add Naming Convention For GGUF files in documents ggerganov/ggml#820

Closed

gguf.md: Add GGUF Naming Convention Section ggerganov/ggml#822

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert.py: Outfile default name change and additional metadata support #4858

convert.py: Outfile default name change and additional metadata support #4858

mofosyne commented Jan 10, 2024 •

edited

Loading

mofosyne commented Jan 16, 2024 •

edited

Loading

This comment has been minimized.

ggerganov left a comment

mofosyne commented Apr 5, 2024

mofosyne commented Apr 6, 2024 •

edited

Loading

mofosyne commented Apr 6, 2024

mofosyne commented Apr 10, 2024

mofosyne commented Apr 12, 2024

mofosyne commented May 6, 2024 •

edited

Loading

mofosyne commented May 6, 2024

compilade left a comment

mofosyne commented May 13, 2024 •

edited

Loading

compilade commented May 13, 2024

mofosyne commented May 13, 2024 •

edited

Loading

convert.py: Outfile default name change and additional metadata support #4858

convert.py: Outfile default name change and additional metadata support #4858

Conversation

mofosyne commented Jan 10, 2024 • edited Loading

Proposed GGUF Naming Convention

mofosyne commented Jan 16, 2024 • edited Loading

This comment has been minimized.

ggerganov left a comment

Choose a reason for hiding this comment

mofosyne commented Apr 5, 2024

mofosyne commented Apr 6, 2024 • edited Loading

mofosyne commented Apr 6, 2024

mofosyne commented Apr 10, 2024

mofosyne commented Apr 12, 2024

mofosyne commented May 6, 2024 • edited Loading

mofosyne commented May 6, 2024

compilade left a comment

Choose a reason for hiding this comment

mofosyne commented May 13, 2024 • edited Loading

compilade commented May 13, 2024

mofosyne commented May 13, 2024 • edited Loading

mofosyne commented Jan 10, 2024 •

edited

Loading

mofosyne commented Jan 16, 2024 •

edited

Loading

mofosyne commented Apr 6, 2024 •

edited

Loading

mofosyne commented May 6, 2024 •

edited

Loading

mofosyne commented May 13, 2024 •

edited

Loading

mofosyne commented May 13, 2024 •

edited

Loading