From 3206f2706380e1f6b26ab92d016e9123dfacda99 Mon Sep 17 00:00:00 2001 From: brian khuu Date: Sun, 7 Jan 2024 14:06:38 +1100 Subject: [PATCH 1/2] Update to readme and added application notes #168 Added recommended path convention for installation as well as application notes. This commit is based on jart recommendation regarding llamafile convention. This is her quote that this is based on: > I want to enable people to integrate with llamafile any way they like. > In terms of recommendations and guidance, I've been following > TheBloke's naming convention when publishing llamafiles to Hugging > Face https://huggingface.co/jartine I also always use the llamafile > tag. So what I'd recommend applications do, is iterate all the files > tagged llamafile on Hugging Face to present those as choices to the > user for LLMs. Be sure to display which user is publishing them, and > sort by heart count. Then, when you download them, feel free to put > them in ~/.llamafile. Then, to show the users which models are > installed, you just look for ~/.llamafile/*.llamafile. --- APPLICATION.md | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 17 +++++++++++++ 2 files changed, 85 insertions(+) create mode 100644 APPLICATION.md diff --git a/APPLICATION.md b/APPLICATION.md new file mode 100644 index 0000000000..8ee2686b8e --- /dev/null +++ b/APPLICATION.md @@ -0,0 +1,68 @@ +# Application Notes + +This Application Notes is targeted at both model packagers and application developers and is for information that is not directly relevant to users who are just simply trying to use llamafiles in a standalone manner. Instead it is for developers who want their models to better integrate with other developers models or applications. (Think of this as an informal ad-hoc community standards page) + +## Finding Llamafiles + +While we do not have a package manager for llamafiles, applications developers are recommended +to search for AI models tagged as `llamafile` in Hugging Face AI repository. +Be sure to display the publishing user or organisation and to sort by trending. + +Within a llamafile repository entry in Hugging Face, there may be multiple `*.llamafile` files +to choose from. The current convention to describe each sub entries of llamafiles is to +insert a table in the model card surrounded by html comment start and end marker named `*-provided-files`. + +For example a model card for a llamafile should have this section that you can parse: + +```markdown + +## Provided files + +| Name | Quant method | Bits | Size | Max RAM required | Use case | +| ---- | ---- | ---- | ---- | ---- | ----- | +| [phi-2.Q2_K.llamafile](https://huggingface.co/jartine/phi-2-llamafile/blob/main/phi-2.Q2_K.llamafile) | Q2_K | 2 | 1.17 GB| 3.67 GB | smallest, significant quality loss - not recommended for most purposes | +... further llamafile entries here ... + + +``` + +## Llamafile Naming Convention + +Llamafiles follow a naming convention of `--x-.llamafile`. + +The components are: +1. **Model**: A descriptive name for the model type or architecture. +2. **Version (Optional)**: Denotes the model version number, starting at `v1` if not specified, formatted as `v.`. + - Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card. +3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model. +4. **Parameters**: Indicates the number of parameters and their scale, represented as ``: + - `T`: Trillion parameters. + - `B`: Billion parameters. + - `M`: Million parameters. + - `K`: Thousand parameters. +5. **Quantization**: This part specifies how the model parameters are quantized or compressed. The notation is influenced by the `./quantize --help` command in `llama.cpp`. + - Uncompressed formats: + - `F16`: 16-bit floats per weight + - `F32`: 32-bit floats per weight + - Quantization (Compression) formats: + - `Q`: X bits per weight, where `X` could be `4` (for 4 bits) or `8` (for 8 bits) etc... + - Variants provide further details on how the quantized weights are interpreted: + - `_K`: k-quant models, which further have specifiers like `_S`, `_M`, and `_L` for small, medium, and large, respectively, if they are not specified, it defaults to medium. + - `_`: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight. This convention was found from this [llama.cpp issue ticket on QX_4](https://github.com/ggerganov/llama.cpp/issues/1240). + - Even Number (0 or 2): ` = * ` + - Odd Number (1 or 3): ` = + * ` + + +## Installing A Llamafile And Making It Accessible To Other Local Applications + +Llamafiles are designed to be standalone and portable, eliminating the need for a traditional installation. For optimal discovery and integration with local application scripts/programs, we recommend the following search paths: + +- **System-wide Paths**: + - `/usr/share/llamafile` (Linux/MacOS/BSD): Ideal for developers creating packages, commonly accessed via package managers like `apt get install` in Debian-based Linux OSes. + - `/opt/llamafile` (Linux/MacOS/BSD): Positioned in the `/opt` directory, suitable for installers downloaded directly from the web. + - `C:\llamafile` (Windows): A direct path for Windows systems. + +- **User-specific Path**: + - `~/.llamafile` (Linux/MacOS/BSD): Located in the user's home directory, facilitating user-specific configurations in line with Unix-like conventions. + +For applications or scripts referencing the Llamafile path, setting the environment variable `$LLAMAFILE_PATH` to a singular path can enhance configuration simplicity and system consistency. diff --git a/README.md b/README.md index 9e9a000cda..9807683b63 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,23 @@ chmod +x llava-v1.5-7b-q4.llamafile **Having trouble? See the "Gotchas" section below.** +## Installing A Llamafile And Making It Accessible To Other Local Applications + +Llamafiles are designed to be standalone and portable, eliminating the need for a traditional installation. For optimal discovery and integration with local application scripts/programs, we recommend the following install paths: + +- **System-wide Paths**: + - `/opt/llamafile` (Linux/MacOS/BSD) + - `C:\llamafile` (Windows) + +- **User-specific Path**: + - `~/.llamafile` (Linux/MacOS/BSD) + +- **Additional Search Locations**: These path serves as a reference for applications or scripts that might expect to find the Llamafile here. However, direct installations to this directory are discouraged unless you know what you are doing. + - `/usr/share/llamafile` (Linux/MacOS/BSD) + +For applications or scripts referencing the Llamafile path, setting the environment variable `$LLAMAFILE_PATH` to a singular path. + + ### JSON API Quickstart When llamafile is started, in addition to hosting a web From 0f217c9a25158c5ba379dee92abdf8c8a540a849 Mon Sep 17 00:00:00 2001 From: Brian Date: Sat, 18 May 2024 17:20:00 +1000 Subject: [PATCH 2/2] Update APPLICATION.md Use https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention as the canonical reference. --- APPLICATION.md | 24 +----------------------- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/APPLICATION.md b/APPLICATION.md index 8ee2686b8e..87cbb73164 100644 --- a/APPLICATION.md +++ b/APPLICATION.md @@ -28,29 +28,7 @@ For example a model card for a llamafile should have this section that you can p ## Llamafile Naming Convention -Llamafiles follow a naming convention of `--x-.llamafile`. - -The components are: -1. **Model**: A descriptive name for the model type or architecture. -2. **Version (Optional)**: Denotes the model version number, starting at `v1` if not specified, formatted as `v.`. - - Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card. -3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model. -4. **Parameters**: Indicates the number of parameters and their scale, represented as ``: - - `T`: Trillion parameters. - - `B`: Billion parameters. - - `M`: Million parameters. - - `K`: Thousand parameters. -5. **Quantization**: This part specifies how the model parameters are quantized or compressed. The notation is influenced by the `./quantize --help` command in `llama.cpp`. - - Uncompressed formats: - - `F16`: 16-bit floats per weight - - `F32`: 32-bit floats per weight - - Quantization (Compression) formats: - - `Q`: X bits per weight, where `X` could be `4` (for 4 bits) or `8` (for 8 bits) etc... - - Variants provide further details on how the quantized weights are interpreted: - - `_K`: k-quant models, which further have specifiers like `_S`, `_M`, and `_L` for small, medium, and large, respectively, if they are not specified, it defaults to medium. - - `_`: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight. This convention was found from this [llama.cpp issue ticket on QX_4](https://github.com/ggerganov/llama.cpp/issues/1240). - - Even Number (0 or 2): ` = * ` - - Odd Number (1 or 3): ` = + * ` +Llamafiles follows the same naming convention as gguf but instead of `.gguf` its `.llamafile`. Consult [gguf naming convention]([https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#specification](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention)) for specifics. ## Installing A Llamafile And Making It Accessible To Other Local Applications