-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert.py: Outfile default name change and additional metadata support #4858
convert.py: Outfile default name change and additional metadata support #4858
Conversation
Any thoughts about this proposal @cebtenzzre ? (Also the addition of an extra field |
8f44129
to
da064a8
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like some extra review on the Python implementation before merging
3a10cec
to
840e7bf
Compare
no problems. Just rebased so that there is only one file to review |
Should I also add a command to tell the user what default file it will generate? This may be required for automated pipelines scripts so they know what .gguf artifacts was generated? If so, this is what I plan to add parser.add_argument("--get-outfile", action="store_true", help="get calculated default outfile format")
...
if args.get_outfile:
logging.basicConfig(level=logging.CRITICAL)
model_plus = load_some_model(args.model)
params = Params.load(model_plus)
model = model_plus.model
model = convert_model_names(model, params, args.skip_unknown)
ftype = pick_output_type(model, args.outtype)
model = convert_to_output_type(model, ftype)
model_params_count = model_parameter_count(model_plus.model)
print(f"{default_convention_outfile(model_plus.paths, ftype, params, model_params_count, metadata)}")
return |
Also if it would help here is https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/llamafile-creation.sh which shows me trying to convert from safetensor to gguf and using the new metadata format. Note this command: ./llama.cpp/convert.py maykeye_tinyllama --outtype f16 --metadata maykeye_tinyllama-metadata.json The metadata used above is also in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/maykeye_tinyllama-metadata.json and looks like: {
"general.name": "TinyLLama",
"general.version": "v0",
"general.author": "mofosyne",
"general.url": "https://huggingface.co/mofosyne/TinyLLama-v0-llamafile",
"general.description": "This gguf is ported from a first version of Maykeye attempt at recreating roneneldan/TinyStories-1M but using Llama architecture",
"general.license": "apache-2.0",
"general.source_url": "https://huggingface.co/Maykeye/TinyLLama-v0",
"general.source_hf_repo": "https://huggingface.co/Maykeye/TinyLLama-v0"
} So the other factor you may want to also consider when reviewing, is if I included enough metadata as well. |
Slightly realize that we may also need to adjust the other conversion scripts to match this as well. But on cursory look at the others, wasn't exactly sure where to get all the values. Plus we should likely first make sure we agree on the naming convention first at least on this file before touching the rest. |
Also heads, up that for me to add |
ddfaad9
to
4f99da4
Compare
Now that the python logging was refactored, I've took the opportunity to refactor this PR to include
|
4f99da4
to
74fe2ea
Compare
Okay now testing again the usage flow implication of adding #!/bin/bash
MODEL_DIR="maykeye_tinyllama"
METADATA_FILE="maykeye_tinyllama-metadata.json"
###############################################################################
# Pull both model folder, llamafile (for the engine) and llama.cpp (for the conversion script)
echo == Prep Enviroment ==
git submodule update --init
###############################################################################
echo == Build and prep the llamafile engine execuable ==
pushd llamafile
make -j8
make
popd
###############################################################################
echo == What is our llamafile name going to be? ==
OUTFILE=$(./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --get-outfile)
echo We will be aiming to generate $OUTFILE.llamafile
###############################################################################
echo == Convert from safetensor to gguf ==
./llama.cpp/convert.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16
mv ${MODEL_DIR}/${OUTFILE}.gguf ${OUTFILE}.gguf
###############################################################################
echo == Generating Llamafile ==
cp ./llamafile/o/llama.cpp/main/main ${OUTFILE}.llamafile
# Create an .args file with settings defaults
cat >.args <<EOF
-m
${OUTFILE}.gguf
EOF
# zip align engine, gguf and default args
./llamafile/o/llamafile/zipalign -j0 ${OUTFILE}.llamafile ${OUTFILE}.gguf .args
###############################################################################
echo == Test Output ==
./${OUTFILE}.llamafile --cli -p "hello world the gruff man said" It would be good to hear other ggml packagers/maintainers to see if this workflow makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Some minor things to correct, but it's overall fine :)
Okay double checked in my hugging face repo that my replications script now works on the merged master and can confirm no problems. https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile It's now ready for use. @compilade thanks for the review, feel free to suggest improvements for metadata override to #7165 if you feel like we should adjust the behavior to make it easier to use. |
@mofosyne Consider updating the example in the description of this PR to use valid metadata keys according to what was merged. Note that |
Good point. I've updated the metadata example in both the issue ticket and this PR description. So https://github.com/ggerganov/ggml/blob/master/docs/gguf.md is the canonical source? Gotcha. I've already merged in the changes, but I'll note it in the description at least and hopefully we can adjust the source comments later as needed. |
…rt (ggerganov#4858) * convert.py: Outfile default name change and additional metadata support * convert.py: don't stringify Metadata load method output * convert.py: typo fix * convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
Was working on llamafile and was exploring how people commonly name their files in hugging face. Based on that I am suggesting a naming convention for it that I think can also apply for GGUF, so I'm applying it to the convert.py that I am currently using.
This commit adds a
--metadata
which would read from a file like below which would give convert.py enough of a context to figure out a reasonable default file name.The above when applied to a hugging face model Maykeye/TinyLLama-v0 will generate
tinystories-v0-5M-F16.gguf
. Key thing to note is that it is able to estimate the total parameter size (version and name is determined by metadata.json and by context).I'm also proposing that we add an additional field
general.version
to the gguf standard which would be handy for models that are from the same group and is effectively the same model but trained further. People have been attaching version to model name, so it be better to allow people to split it to model name and version.The above metadata KV store key names are based on https://github.com/ggerganov/ggml/blob/master/docs/gguf.md which appears to be the canonical reference for gguf key values names.
Proposed GGUF Naming Convention
GGUF follow a naming convention of
<Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf
.The components are:
v1
if not specified, formatted asv<Major>.<Minor>
.<count><scale-prefix>
:T
: Trillion parameters.B
: Billion parameters.M
: Million parameters.K
: Thousand parameters../quantize --help
command inllama.cpp
.F16
: 16-bit floats per weightF32
: 32-bit floats per weightQ<X>
: X bits per weight, whereX
could be4
(for 4 bits) or8
(for 8 bits) etc..._K
: k-quant models, which further have specifiers like_S
,_M
, and_L
for small, medium, and large, respectively, if they are not specified, it defaults to medium._<num>
: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight. This convention was found from this llama.cpp issue ticket on QX_4.<model weights> = <scaling factor> * <quantised weight>
<model weights> = <offset factor> + <scaling factor> * <quantised weight>