Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : save downloaded models to local cache #7252

Closed
ggerganov opened this issue May 13, 2024 · 8 comments
Closed

llama : save downloaded models to local cache #7252

ggerganov opened this issue May 13, 2024 · 8 comments
Labels
enhancement New feature or request examples good first issue Good for newcomers

Comments

@ggerganov
Copy link
Member

ggerganov commented May 13, 2024

We've recently introduced the --hf-repo and --hf-file helper args to common in #6234:

ref #4735 #5501 #6085 #6098

Sample usage:

./bin/main \
  --hf-repo TinyLlama/TinyLlama-1.1B-Chat-v0.2-GGUF \
  --hf-file ggml-model-q4_0.gguf \
  -m tinyllama-1.1-v0.2-q4_0.gguf \
  -p "I believe the meaning of life is" -n 32

./bin/main \
  --hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF \
  -m tinyllama-1.1b-chat-v1.0.Q4_0.gguf \
  -p "I believe the meaning of life is" -n 32

Downloads `https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.2-GGUF/resolve/main/ggml-model-q4_0.gguf` and saves it to `tinyllama-1.1-v0.2-q4_0.gguf`

Requires build with `LLAMA_CURL`

Currently, the downloaded files via curl are stored in a destination based on the --model CLI arg.

If --model is not provided, we would like to auto-store the downloaded model files in a local cache, similar to what other frameworks like HF/transformers do.

Here is the documentation of this functionality in HF for convenience and reference:

URL: https://huggingface.co/docs/transformers/installation?highlight=transformers_cache#cache-setup

### Cache setup

Pretrained models are downloaded and locally cached at: ~/.cache/huggingface/hub. This is the default directory given by the shell environment variable TRANSFORMERS_CACHE. On Windows, the default directory is given by C:\Users\username\.cache\huggingface\hub. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:

1. Shell environment variable (default): HUGGINGFACE_HUB_CACHE or TRANSFORMERS_CACHE.
2. Shell environment variable: HF_HOME.
3. Shell environment variable: XDG_CACHE_HOME + /huggingface.

🤗 Transformers will use the shell environment variables PYTORCH_TRANSFORMERS_CACHE or PYTORCH_PRETRAINED_BERT_CACHE if you are coming from an earlier iteration of this library and have set those environment variables, unless you specify the shell environment variable TRANSFORMERS_CACHE.

The goal of this issue is to implement similar functionality in llama.cpp. The environment variables should be named accordingly to the llama.cpp patterns and the local cache should be utilized only when the --model CLI argument is not explicitly provided in commands like main and server

P.S. I'm interested in exercising "Copilot Workspace" to see if it would be capable to implement this task by itself

P.S.2 So CW is quite useless at this point for llama.cpp - it cannot handle files a few thousand lines of code:

CW snapshot: https://copilot-workspace.githubnext.com/ggerganov/llama.cpp/issues/7252?shareId=379fdaa0-3580-46ba-be68-cb061518a38c

@ggerganov ggerganov added enhancement New feature or request examples labels May 13, 2024
@ggerganov ggerganov changed the title llama : save downloaded file to local cache llama : save downloaded models to local cache May 13, 2024
@julien-c
Copy link
Contributor

FWIW the HF cache layout is quite nice and it's git-aware:

@LysandreJik and I implemented it a while ago and it's been working well.

For instance this is the layout for one given model repo with two revisions/two files inside of it:

    [  96]  .
    └── [ 160]  models--julien-c--EsperBERTo-small
        ├── [ 160]  blobs
        │   ├── [321M]  403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
        │   ├── [ 398]  7cb18dc9bafbfcf74629a4b760af1b160957a83e
        │   └── [1.4K]  d7edf6bd2a681fb0175f7735299831ee1b22b812
        ├── [  96]  refs
        │   └── [  40]  main
        └── [ 128]  snapshots
            ├── [ 128]  2439f60ef33a0d46d85da5001d52aeda5b00ce9f
            │   ├── [  52]  README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
            │   └── [  76]  pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
            └── [ 128]  bbc77c8132af1cc5cf678da3f1ddf2de43606d48
                ├── [  52]  README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
                └── [  76]  pytorch_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd

@ggerganov ggerganov added the good first issue Good for newcomers label May 13, 2024
@ngxson
Copy link
Collaborator

ngxson commented May 13, 2024

Probably we can take advantage of Hub API. For example, to list all files in a repo: https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B/tree/main

This could potentially remove the need for --hf-file and etag checking

@amirzia
Copy link
Contributor

amirzia commented May 17, 2024

Hi, this is my first contribution to this project.

I made a PR with a basic implementation of the cache mechanism. The dowloaded files are stored in the directory specified by LLAMA_CACHE env variable. If the env variable is not provided, the models are stored in the default cache directory: .cache/.

Let me know if I'm going to the right direction.

@ggerganov
Copy link
Member Author

@amirzia I think the proposed changes are good - pretty much what I imagined as a first step.

I'm not sure what are the benefits of having a git-aware cache similar to HF, but if we think there are reasonable advantages, we can work on that to improve the functionality further. Maybe for now it's fine to merge the PR as it is

@julien-c
Copy link
Contributor

organic community demand for a shared cached between all local ML apps: https://x.com/filipviz/status/1792981186446274625

@amirzia
Copy link
Contributor

amirzia commented May 22, 2024

Should we agree on a common standard (layout and path)?

There is already this proposal for a standard path: https://filip.world/post/modelpath/. We also have the HF git-aware layout (which Julien seems to really like 😄).

Although I'm not sure if llama.cpp and other applications benefit from having the history of models.

@ggerganov
Copy link
Member Author

Ah I see now. The shared location seems reasonable in order to have different apps sharing the same model data.

Although I'm not sure if llama.cpp and other applications benefit from having the history of models.

I also don't think that llama.cpp has use cases for the git-aware structure and it might not be trivial to implement in C++. Filesystem operations are real pain in C++

@ngxson
Copy link
Collaborator

ngxson commented Dec 13, 2024

I'm closing this issue since it's already implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request examples good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants