-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : save downloaded models to local cache #7252
Comments
FWIW the HF cache layout is quite nice and it's git-aware: @LysandreJik and I implemented it a while ago and it's been working well. For instance this is the layout for one given model repo with two revisions/two files inside of it:
|
Probably we can take advantage of Hub API. For example, to list all files in a repo: This could potentially remove the need for |
Hi, this is my first contribution to this project. I made a PR with a basic implementation of the cache mechanism. The dowloaded files are stored in the directory specified by Let me know if I'm going to the right direction. |
@amirzia I think the proposed changes are good - pretty much what I imagined as a first step. I'm not sure what are the benefits of having a git-aware cache similar to HF, but if we think there are reasonable advantages, we can work on that to improve the functionality further. Maybe for now it's fine to merge the PR as it is |
organic community demand for a shared cached between all local ML apps: https://x.com/filipviz/status/1792981186446274625 |
Should we agree on a common standard (layout and path)? There is already this proposal for a standard path: https://filip.world/post/modelpath/. We also have the HF git-aware layout (which Julien seems to really like 😄). Although I'm not sure if llama.cpp and other applications benefit from having the history of models. |
Ah I see now. The shared location seems reasonable in order to have different apps sharing the same model data.
I also don't think that |
I'm closing this issue since it's already implemented |
We've recently introduced the
--hf-repo
and--hf-file
helper args tocommon
in #6234:Currently, the downloaded files via
curl
are stored in a destination based on the--model
CLI arg.If
--model
is not provided, we would like to auto-store the downloaded model files in a local cache, similar to what other frameworks like HF/transformers do.Here is the documentation of this functionality in HF for convenience and reference:
URL: https://huggingface.co/docs/transformers/installation?highlight=transformers_cache#cache-setup
The goal of this issue is to implement similar functionality in
llama.cpp
. The environment variables should be named accordingly to thellama.cpp
patterns and the local cache should be utilized only when the--model
CLI argument is not explicitly provided in commands likemain
andserver
P.S. I'm interested in exercising "Copilot Workspace" to see if it would be capable to implement this task by itself
P.S.2 So CW is quite useless at this point for
llama.cpp
- it cannot handle files a few thousand lines of code:CW snapshot: https://copilot-workspace.githubnext.com/ggerganov/llama.cpp/issues/7252?shareId=379fdaa0-3580-46ba-be68-cb061518a38c
The text was updated successfully, but these errors were encountered: