Improve usability of --model-url & related flags #6930

ochafik · 2024-04-26T14:26:35Z

--model is now inferred as models/$filename with the filename from --model-url / -mu or --hf-file / -hff if set (it still defaults to models/7B/gguf-model-f16.gguf otherwise). Downloading different URLs will no longer overwrite previous downloads.

URL model download now write a .json companion metadata file (instead of the previous separate .etag & .lastModified files). This also contains the URL itself, which is useful to remember the exact origin of models & prevents accidental overwrites of files.

Note: This is a breaking change wrt/ already downloaded models as .etag and .lastModified files are now obsolete. If you're used to typing the following:

./main -mu <some-url> -m <some-model-file>

Then you can avoid a re-download by migrating your .etag & .lastModified files to a .json file using a simple Python snippet

Show `migrate_etag.py` & its usage

import os
import json, sys

for model in sys.argv[1:]:
    if os.path.exists(f'{model}.etag') and not os.path.exists(f'{model}.json'):
        with open(f'{model}.etag', 'r') as f: etag = f.read()
        with open(f'{model}.lastModified', 'r') as f: last_modified = f.read()
        with open(f'{model}.json', 'w') as f: f.write(json.dumps(dict(etag=etag, lastModified=last_modified), indent=2))
        print(f'Created {model}.json')

python migrate_etag.py models/7B/ggml-model-f16.gguf
cat models/7B/ggml-model-f16.gguf.json
# {
#   "etag": "\"40d7e29dab8ea579f8b8087bc9370c8a-359\"",
#   "lastModified": "Fri, 19 Apr 2024 02:34:23 GMT"
# }
rm models/7B/ggml-model-f16.gguf.{etag,lastModified}

Smaller changes:
- Log about etag / modified time changes that cause re-downloads
- Enable the defaulting of --hf-file to --model on server (as was done on main)
- Mitigate risk of buffer overflows in headers handling

make clean && make -j LLAMA_CURL=1 main server

./main -p Test -n 100 -mu https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
# ...

./main -p Test -n 100 -hfr NousResearch/Meta-Llama-3-8B-Instruct-GGUF -hff Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
# ...

ls models/
# Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
# Meta-Llama-3-8B-Instruct-Q4_K_M.gguf.json
# Phi-3-mini-4k-instruct-q4.gguf
# Phi-3-mini-4k-instruct-q4.gguf.json
# ...

cat models/Phi-3-mini-4k-instruct-q4.gguf.json
# {
#     "url": "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"
#     "etag": "\"b83ce18f1e735d825aa3402db6dae311-145\"",
#     "lastModified": "Thu, 25 Apr 2024 21:26:15 GMT",
# }

TODO:

Provide simple bash / python snippet to migrate existing .etag / .lastModified files to JSON (or be backwards compatible)

…file (or else legacy models/7B/ggml-model-f16.gguf)

…astModified)

phymbert · 2024-04-26T14:35:10Z

Great, please mind it will force people to download again files from the remote URL, kind of a breaking change.

ochafik · 2024-04-26T15:10:41Z

Great, please mind it will force people to download again files from the remote URL, kind of a breaking change.

@phymbert ah I forgot, indeed! I initially planned on being backwards compatible (I felt it had low long-term usefulness but happy to add this code back, it's just a few lines) but I thought it's easier to provide a code snippet for people to create the JSON file out of the etag & lastModified. Added this as TODO before I undraft this PR.

(also technically even without a migration snippet people can just use -m models/... to use their already downloaded model(s), but agree it's an unpleasant surprise)

common/common.cpp

phymbert · 2024-04-26T18:23:02Z

common/common.cpp

-                        n_items - strlen(last_modified_prefix) - 2); // Remove CRLF
+            std::string header(buffer, n_items);
+            std::smatch match;
+            if (std::regex_match(header, match, std::regex("([^:]+): (.*)\r\n", std::regex_constants::multiline))) {


the regex will be compiled at each header ? do we really need a regex to parse http headers ?

do we really need a regex to parse http headers ?

Agree std::regex may seem overkill but it's simpler & safer than C string manipulations.

In the previous code for instance, just realized there's at least one buffer overflow bug (cc/ @ggerganov FYI): this strncpy will write beyond the stack-allocated etag's LLAMA_CURL_MAX_HEADER_LENGTH (=256) bytes and into stack-allocated etag_path if the ETag header value length is > 256 bytes, possibly giving HuggingFace (or anyone else you download from) write access to the system (a fix would be to turn the last arg of strncpy to MIN(sizeof(etag) - 1, n_items - strlen(etag_prefix) - 2), but it would make the code a bit harder to read & maintain).

the regex will be compiled at each header ?

Good point! I had opted to be slightly wasteful in CPU cycles here as the lifecyle management of regexes is trickier in the C callback context (e.g. can't allocate outside & pass through lambda capture), and the easiest alternatives (static alloc inside the callback or globally) were a bit wasteful in memory as these regex aren't useful afterwards. Let's go for potential shorter startup time? (now using local static allocs).

You could always pass a pointer to the compiled regex in the userdata, however this is not performance sensitive code at all, and it's preferable to keep the code simple and easier to maintain.

common/common.cpp

github-actions · 2024-04-26T19:12:17Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 440 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=10678.97ms p(95)=29221.78ms fails=, finish reason: stop=382 truncated=58
Prompt processing (pp): avg=113.15tk/s p(95)=512.71tk/s
Token generation (tg): avg=24.41tk/s p(95)=37.58tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=model-args commit=5598a6a87d45159baf7b842b99bf14812f2233ec

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 349.73, 349.73, 349.73, 349.73, 349.73, 438.56, 438.56, 438.56, 438.56, 438.56, 390.64, 390.64, 390.64, 390.64, 390.64, 411.3, 411.3, 411.3, 411.3, 411.3, 421.51, 421.51, 421.51, 421.51, 421.51, 471.37, 471.37, 471.37, 471.37, 471.37, 493.34, 493.34, 493.34, 493.34, 493.34, 498.95, 498.95, 498.95, 498.95, 498.95, 515.04, 515.04, 515.04, 515.04, 515.04, 530.2, 530.2, 530.2, 530.2, 530.2, 533.2, 533.2, 533.2, 533.2, 533.2, 544.61, 544.61, 544.61, 544.61, 544.61, 527.82, 527.82, 527.82, 527.82, 527.82, 521.35, 521.35, 521.35, 521.35, 521.35, 564.35, 564.35, 564.35, 564.35, 564.35, 573.4, 573.4, 573.4, 573.4, 573.4, 575.49, 575.49, 575.49, 575.49, 575.49, 587.03, 587.03, 587.03, 587.03, 587.03, 590.58, 590.58, 590.58, 590.58, 590.58, 590.25, 590.25, 590.25, 590.25, 590.25, 609.94, 609.94, 609.94, 609.94, 609.94, 610.03, 610.03, 610.03, 610.03, 610.03, 606.36, 606.36, 606.36, 606.36, 606.36, 614.78, 614.78, 614.78, 614.78, 614.78, 615.01, 615.01, 615.01, 615.01, 615.01, 618.57, 618.57, 618.57, 618.57, 618.57, 634.21, 634.21, 634.21, 634.21, 634.21, 633.22, 633.22, 633.22, 633.22, 633.22, 637.24, 637.24, 637.24, 637.24, 637.24, 637.74, 637.74, 637.74, 637.74, 637.74, 639.56, 639.56, 639.56, 639.56, 639.56, 633.62, 633.62, 633.62, 633.62, 633.62, 631.72, 631.72, 631.72, 631.72, 631.72, 626.85, 626.85, 626.85, 626.85, 626.85, 626.47, 626.47, 626.47, 626.47, 626.47, 628.55, 628.55, 628.55, 628.55, 628.55, 632.68, 632.68, 632.68, 632.68, 632.68, 632.6, 632.6, 632.6, 632.6, 632.6, 636.63, 636.63, 636.63, 636.63, 636.63, 642.95, 642.95, 642.95, 642.95, 642.95, 636.32, 636.32, 636.32, 636.32, 636.32, 642.43, 642.43, 642.43, 642.43, 642.43, 630.8, 630.8, 630.8, 630.8, 630.8, 630.72, 630.72, 630.72, 630.72, 630.72, 630.42, 630.42, 630.42, 630.42, 630.42, 631.47, 631.47, 631.47, 631.47, 631.47, 635.18, 635.18, 635.18, 635.18, 635.18, 637.81, 637.81, 637.81, 637.81, 637.81, 632.71, 632.71, 632.71, 632.71, 632.71, 630.95, 630.95, 630.95, 630.95, 630.95, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 629.21, 629.21, 629.21, 629.21, 629.21, 628.83, 628.83, 628.83, 628.83, 628.83, 626.44, 626.44, 626.44, 626.44, 626.44, 624.97, 624.97, 624.97, 624.97, 624.97, 628.89, 628.89, 628.89, 628.89, 628.89, 631.73, 631.73, 631.73, 631.73, 631.73, 631.94, 631.94, 631.94, 631.94, 631.94, 631.98, 631.98, 631.98, 631.98]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.81, 34.81, 34.81, 34.81, 34.81, 35.65, 35.65, 35.65, 35.65, 35.65, 21.52, 21.52, 21.52, 21.52, 21.52, 21.59, 21.59, 21.59, 21.59, 21.59, 20.6, 20.6, 20.6, 20.6, 20.6, 20.97, 20.97, 20.97, 20.97, 20.97, 21.52, 21.52, 21.52, 21.52, 21.52, 22.6, 22.6, 22.6, 22.6, 22.6, 23.49, 23.49, 23.49, 23.49, 23.49, 23.75, 23.75, 23.75, 23.75, 23.75, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 24.08, 24.08, 24.08, 24.08, 24.08, 24.07, 24.07, 24.07, 24.07, 24.07, 23.7, 23.7, 23.7, 23.7, 23.7, 23.74, 23.74, 23.74, 23.74, 23.74, 22.87, 22.87, 22.87, 22.87, 22.87, 22.95, 22.95, 22.95, 22.95, 22.95, 23.08, 23.08, 23.08, 23.08, 23.08, 23.3, 23.3, 23.3, 23.3, 23.3, 23.27, 23.27, 23.27, 23.27, 23.27, 23.0, 23.0, 23.0, 23.0, 23.0, 22.72, 22.72, 22.72, 22.72, 22.72, 22.7, 22.7, 22.7, 22.7, 22.7, 22.72, 22.72, 22.72, 22.72, 22.72, 22.88, 22.88, 22.88, 22.88, 22.88, 22.98, 22.98, 22.98, 22.98, 22.98, 22.79, 22.79, 22.79, 22.79, 22.79, 22.86, 22.86, 22.86, 22.86, 22.86, 23.03, 23.03, 23.03, 23.03, 23.03, 23.04, 23.04, 23.04, 23.04, 23.04, 23.08, 23.08, 23.08, 23.08, 23.08, 22.61, 22.61, 22.61, 22.61, 22.61, 22.46, 22.46, 22.46, 22.46, 22.46, 22.25, 22.25, 22.25, 22.25, 22.25, 22.41, 22.41, 22.41, 22.41, 22.41, 22.54, 22.54, 22.54, 22.54, 22.54, 22.56, 22.56, 22.56, 22.56, 22.56, 22.79, 22.79, 22.79, 22.79, 22.79, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.83, 22.83, 22.83, 22.83, 22.83, 22.76, 22.76, 22.76, 22.76, 22.76, 22.55, 22.55, 22.55, 22.55, 22.55, 22.52, 22.52, 22.52, 22.52, 22.52, 22.63, 22.63, 22.63, 22.63, 22.63, 22.76, 22.76, 22.76, 22.76, 22.76, 22.86, 22.86, 22.86, 22.86, 22.86, 22.94, 22.94, 22.94, 22.94, 22.94, 22.81, 22.81, 22.81, 22.81, 22.81, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 20.84, 20.84, 20.84, 20.84, 20.84, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.83, 20.83, 20.83, 20.83, 20.83, 20.8, 20.8, 20.8, 20.8, 20.8, 20.84, 20.84, 20.84, 20.84]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.39, 0.39, 0.39, 0.39, 0.39, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.14, 0.14, 0.14, 0.14, 0.14, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.34, 0.34, 0.34, 0.34, 0.34, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.34, 0.34, 0.34, 0.34, 0.34, 0.37, 0.37, 0.37, 0.37, 0.37, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.33, 0.33, 0.33, 0.33, 0.33, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.09, 0.09, 0.09, 0.09, 0.09, 0.33, 0.33, 0.33, 0.33, 0.33, 0.46, 0.46, 0.46, 0.46, 0.46, 0.54, 0.54, 0.54, 0.54, 0.54, 0.6, 0.6, 0.6, 0.6, 0.6, 0.61, 0.61, 0.61, 0.61, 0.61, 0.62, 0.62, 0.62, 0.62, 0.62, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.11, 0.11, 0.11, 0.11]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0]

phymbert · 2024-04-30T07:02:57Z

examples/server/tests/features/embeddings.feature

@@ -5,7 +5,7 @@ Feature: llama.cpp server
  Background: Server startup
    Given a server listening on localhost:8080
    And   a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
-    And   a model file ggml-model-f16.gguf
+    And   a model file bert-bge-small.gguf


Could you please explain why this change is now necessary?

Sorry I missed this! I think there was another test case that was implicitly downloading another URL to ggml-model-f16.gguf, causing a collision

* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf) * args: main & server now call gpt_params_handle_model_default * args: define DEFAULT_MODEL_PATH + update cli docs * curl: check url of previous download (.json metadata w/ url, etag & lastModified) * args: fix update to quantize-stats.cpp * curl: support legacy .etag / .lastModified companion files * curl: rm legacy .etag file support * curl: reuse regex across headers callback calls * curl: unique_ptr to manage lifecycle of curl & outfile * curl: nit: no need for multiline regex flag * curl: update failed test (model file collision) + gitignore *.gguf.json

ochafik added 5 commits April 26, 2024 00:40

args: default --model to models/ + filename from --model-url or --hf-…

40a961d

…file (or else legacy models/7B/ggml-model-f16.gguf)

args: main & server now call gpt_params_handle_model_default

9c0db4d

args: define DEFAULT_MODEL_PATH + update cli docs

e55dfde

curl: check url of previous download (.json metadata w/ url, etag & l…

0664e9b

…astModified)

args: fix update to quantize-stats.cpp

5ce50f6

curl: support legacy .etag / .lastModified companion files

eeb3d58

phymbert reviewed Apr 26, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

phymbert reviewed Apr 26, 2024

View reviewed changes

common/common.cpp Show resolved Hide resolved

phymbert reviewed Apr 26, 2024

View reviewed changes

common/common.cpp Show resolved Hide resolved

phymbert reviewed Apr 26, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

ochafik added 4 commits April 27, 2024 15:25

curl: rm legacy .etag file support

5c4aea1

curl: reuse regex across headers callback calls

4c4dc25

curl: unique_ptr to manage lifecycle of curl & outfile

abffd1b

Merge remote-tracking branch 'origin/master' into model-args

f70e4d6

ochafik marked this pull request as ready for review April 27, 2024 15:51

curl: nit: no need for multiline regex flag

f97fa9b

phymbert approved these changes Apr 29, 2024

View reviewed changes

ochafik added 2 commits April 29, 2024 19:25

Merge remote-tracking branch 'origin/master' into model-args

84b966d

curl: update failed test (model file collision) + gitignore *.gguf.json

5598a6a

ochafik merged commit 8843a98 into ggerganov:master Apr 29, 2024
63 checks passed

phymbert reviewed Apr 30, 2024

View reviewed changes

ochafik mentioned this pull request Jun 8, 2024

url: save -mu downloads to new cache location #7826

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve usability of --model-url & related flags #6930

Improve usability of --model-url & related flags #6930

ochafik commented Apr 26, 2024 •

edited

Loading

phymbert commented Apr 26, 2024

ochafik commented Apr 26, 2024

phymbert Apr 26, 2024

ochafik Apr 27, 2024

slaren Apr 27, 2024

github-actions bot commented Apr 26, 2024 •

edited

Loading

phymbert Apr 30, 2024

ochafik May 18, 2024

Improve usability of --model-url & related flags #6930

Improve usability of --model-url & related flags #6930

Conversation

ochafik commented Apr 26, 2024 • edited Loading

phymbert commented Apr 26, 2024

ochafik commented Apr 26, 2024

phymbert Apr 26, 2024

Choose a reason for hiding this comment

ochafik Apr 27, 2024

Choose a reason for hiding this comment

slaren Apr 27, 2024

Choose a reason for hiding this comment

github-actions bot commented Apr 26, 2024 • edited Loading

phymbert Apr 30, 2024

Choose a reason for hiding this comment

ochafik May 18, 2024

Choose a reason for hiding this comment

ochafik commented Apr 26, 2024 •

edited

Loading

github-actions bot commented Apr 26, 2024 •

edited

Loading