-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve usability of --model-url & related flags #6930
Conversation
…file (or else legacy models/7B/ggml-model-f16.gguf)
Great, please mind it will force people to download again files from the remote URL, kind of a breaking change. |
@phymbert ah I forgot, indeed! I initially planned on being backwards compatible (I felt it had low long-term usefulness but happy to add this code back, it's just a few lines) but I thought it's easier to provide a code snippet for people to create the JSON file out of the etag & lastModified. Added this as TODO before I undraft this PR. (also technically even without a migration snippet people can just use |
common/common.cpp
Outdated
n_items - strlen(last_modified_prefix) - 2); // Remove CRLF | ||
std::string header(buffer, n_items); | ||
std::smatch match; | ||
if (std::regex_match(header, match, std::regex("([^:]+): (.*)\r\n", std::regex_constants::multiline))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the regex will be compiled at each header ? do we really need a regex to parse http headers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a regex to parse http headers ?
Agree std::regex may seem overkill but it's simpler & safer than C string manipulations.
In the previous code for instance, just realized there's at least one buffer overflow bug (cc/ @ggerganov FYI): this strncpy will write beyond the stack-allocated etag
's LLAMA_CURL_MAX_HEADER_LENGTH
(=256) bytes and into stack-allocated etag_path
if the ETag header value length is > 256 bytes, possibly giving HuggingFace (or anyone else you download from) write access to the system (a fix would be to turn the last arg of strncpy to MIN(sizeof(etag) - 1, n_items - strlen(etag_prefix) - 2)
, but it would make the code a bit harder to read & maintain).
the regex will be compiled at each header ?
Good point! I had opted to be slightly wasteful in CPU cycles here as the lifecyle management of regexes is trickier in the C callback context (e.g. can't allocate outside & pass through lambda capture), and the easiest alternatives (static alloc inside the callback or globally) were a bit wasteful in memory as these regex aren't useful afterwards. Let's go for potential shorter startup time? (now using local static allocs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could always pass a pointer to the compiled regex in the userdata, however this is not performance sensitive code at all, and it's preferable to keep the code simple and easier to maintain.
@@ -5,7 +5,7 @@ Feature: llama.cpp server | |||
Background: Server startup | |||
Given a server listening on localhost:8080 | |||
And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf | |||
And a model file ggml-model-f16.gguf | |||
And a model file bert-bge-small.gguf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why this change is now necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this! I think there was another test case that was implicitly downloading another URL to ggml-model-f16.gguf, causing a collision
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf) * args: main & server now call gpt_params_handle_model_default * args: define DEFAULT_MODEL_PATH + update cli docs * curl: check url of previous download (.json metadata w/ url, etag & lastModified) * args: fix update to quantize-stats.cpp * curl: support legacy .etag / .lastModified companion files * curl: rm legacy .etag file support * curl: reuse regex across headers callback calls * curl: unique_ptr to manage lifecycle of curl & outfile * curl: nit: no need for multiline regex flag * curl: update failed test (model file collision) + gitignore *.gguf.json
Fixes #6887
--model
is now inferred asmodels/$filename
with the filename from--model-url
/-mu
or--hf-file
/-hff
if set (it still defaults tomodels/7B/gguf-model-f16.gguf
otherwise). Downloading different URLs will no longer overwrite previous downloads.URL model download now write a
.json
companion metadata file (instead of the previous separate.etag
&.lastModified
files). This also contains the URL itself, which is useful to remember the exact origin of models & prevents accidental overwrites of files.Note: This is a breaking change wrt/ already downloaded models as
.etag
and.lastModified
files are now obsolete. If you're used to typing the following:Then you can avoid a re-download by migrating your
.etag
&.lastModified
files to a.json
file using a simple Python snippetShow `migrate_etag.py` & its usage
Smaller changes:
--hf-file
to--model
onserver
(as was done onmain
)TODO: