Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to readme and added application notes #168 #178

Closed

Conversation

mofosyne
Copy link
Collaborator

@mofosyne mofosyne commented Jan 7, 2024

Issue Ticket: #168

Added recommended path convention for installation as well as application notes.

This commit is based on jart recommendation regarding llamafile convention.
This is her quote that this is based on:

I want to enable people to integrate with llamafile any way they like.
In terms of recommendations and guidance, I've been following
TheBloke's naming convention when publishing llamafiles to Hugging
Face https://huggingface.co/jartine I also always use the llamafile
tag. So what I'd recommend applications do, is iterate all the files
tagged llamafile on Hugging Face to present those as choices to the
user for LLMs. Be sure to display which user is publishing them, and
sort by heart count. Then, when you download them, feel free to put
them in ~/.llamafile. Then, to show the users which models are
installed, you just look for ~/.llamafile/*.llamafile.

APPLICATION.md Outdated Show resolved Hide resolved
APPLICATION.md Outdated Show resolved Hide resolved
APPLICATION.md Outdated Show resolved Hide resolved
@mofosyne
Copy link
Collaborator Author

mofosyne commented Jan 8, 2024

Okay revised the readmes based on your suggestion. Also spent some time studying how models naming convention is currenting working in the field and how it's defined in llama.cpp . There is likely issues with the "Llamafile Naming Convention" section but everything else should hopefully be addressed now.

@mofosyne
Copy link
Collaborator Author

mofosyne commented Jan 8, 2024

If we settle on <Model>-<Version>-<Parameters>-<Quantization>.llamafile we may want to adjust the file creation process to enforce this. Maybe have a quiz? Maybe extract as much as possible from the GGUF file format metadata?

At least according to https://github.com/ggerganov/ggml/blob/master/docs/gguf.md you can get the ggml_type (e.g. Q6_K or F32), but according to gguf_tensor_info_t... you can have multiple different mix of tensor type as per gguf_tensor_info_t tensor_infos[header.tensor_count]? In that case... we gotta figure how to deal with the naming scheme if we have multiple types in one model?

mofosyne pushed a commit to mofosyne/llamafile that referenced this pull request Jan 9, 2024
@mofosyne
Copy link
Collaborator Author

mofosyne commented Jan 14, 2024

Balloob founder of home assistant on what he would require LLM container to do

I would love to see a standardized API for local LLMs that is not just a 1:1 copying the ChatGPT API. For example, as Home Assistant talks to a random model, we should be able to query that model to see what the model is capable off.

Is this achievable by adding Key Values to the GGUF? And maybe accessible via something like llmbot.llamafile --get-metadata capabilities or something?

I want to see local LLMs with support for a feature similar or equivalent to OpenAI functions. We cannot include all possible information in the prompt and we need to allow LLMs to make actions to be useful. Constrained grammars do look like an possible alternative. Creating a prompt to write JSON is possible but need quite an elaborate prompt and even then the LLM can make errors. We want to make sure that all JSON coming out of the model is directly actionable without having to ask the LLM what they might have meant for a specific value.

Having a recommended way to easily constraint output to Json would help in the application notes.

As a user of Home Assistant, I would want to easily be able to try out different AI models with a single click from the user interface.

Home Assistant allows users to install add-ons which are Docker containers + metadata. This is how today users install Whisper or Piper for STT and TTS. Both these engines have a wrapper that speaks Wyoming, our voice assistant standard to integrate such engines, among other things. (https://github.com/rhasspy/rhasspy3/blob/master/docs/wyoming.md)

If we rely on just the ChatGPT API to allow interacting with a model, we wouldn't know what capabilities the model has and so can't know what features to use to get valid JSON actions out. Can we pass our function definitions or should we extend the prompt with instructions on how to generate JSON?

@mofosyne mofosyne force-pushed the readme-instaling-a-llamafile branch from 8f310c4 to 131432e Compare April 5, 2024 01:14
@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 5, 2024

Just did a rebase to keep this PR up to date with main

@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 5, 2024

While rebasing ggerganov/llama.cpp#4858 , decided to review my naming convention proposal and noticed that mixtral has a new naming approach for their model like 8x7B to indicate 8 experts 7B quant.

I've added the new addition to both the llama.cpp default filename PR and also updated the readme notes in this repo's PR as well.

@jart jart force-pushed the main branch 2 times, most recently from 622924c to 9cf7363 Compare April 30, 2024 03:35
@mofosyne
Copy link
Collaborator Author

ggerganov/llama.cpp#7165 now merged in so <Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf is now more canonical.

Added recommended path convention for installation as well as
application notes.

This commit is based on jart recommendation regarding llamafile convention.
This is her quote that this is based on:

> I want to enable people to integrate with llamafile any way they like.
> In terms of recommendations and guidance, I've been following
> TheBloke's naming convention when publishing llamafiles to Hugging
> Face https://huggingface.co/jartine I also always use the llamafile
> tag. So what I'd recommend applications do, is iterate all the files
>  tagged llamafile on Hugging Face to present those as choices to the
>  user for LLMs. Be sure to display which user is publishing them, and
> sort by heart count. Then, when you download them, feel free to put
> them in ~/.llamafile. Then, to show the users which models are
> installed, you just look for ~/.llamafile/*.llamafile.
@mofosyne mofosyne force-pushed the readme-instaling-a-llamafile branch from 9503aea to 3206f27 Compare May 13, 2024 04:21
@mofosyne
Copy link
Collaborator Author

rebase to be on top of latest changes and squash all the other fixup commits. Did another review to make sure the doc matches with the now merged in change to llama.cpp convert.py

@mofosyne
Copy link
Collaborator Author

mofosyne commented May 18, 2024

Updated to use https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention as the canonical reference for llamafile filename convention.

On a side note what generates <!-- README_llamafile.md-provided-files start --> that I see occasionally in huggingface model cards?

@mofosyne mofosyne marked this pull request as draft May 25, 2024 13:56
@mofosyne mofosyne closed this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants