Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve usability of --model-url & related flags #6930

Merged
merged 13 commits into from
Apr 29, 2024

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Apr 26, 2024

Fixes #6887

  • --model is now inferred as models/$filename with the filename from --model-url / -mu or --hf-file / -hff if set (it still defaults to models/7B/gguf-model-f16.gguf otherwise). Downloading different URLs will no longer overwrite previous downloads.

  • URL model download now write a .json companion metadata file (instead of the previous separate .etag & .lastModified files). This also contains the URL itself, which is useful to remember the exact origin of models & prevents accidental overwrites of files.

    Note: This is a breaking change wrt/ already downloaded models as .etag and .lastModified files are now obsolete. If you're used to typing the following:

    ./main -mu <some-url> -m <some-model-file>

    Then you can avoid a re-download by migrating your .etag & .lastModified files to a .json file using a simple Python snippet

    Show `migrate_etag.py` & its usage
    import os
    import json, sys
    
    for model in sys.argv[1:]:
        if os.path.exists(f'{model}.etag') and not os.path.exists(f'{model}.json'):
            with open(f'{model}.etag', 'r') as f: etag = f.read()
            with open(f'{model}.lastModified', 'r') as f: last_modified = f.read()
            with open(f'{model}.json', 'w') as f: f.write(json.dumps(dict(etag=etag, lastModified=last_modified), indent=2))
            print(f'Created {model}.json')
    python migrate_etag.py models/7B/ggml-model-f16.gguf
    cat models/7B/ggml-model-f16.gguf.json
    # {
    #   "etag": "\"40d7e29dab8ea579f8b8087bc9370c8a-359\"",
    #   "lastModified": "Fri, 19 Apr 2024 02:34:23 GMT"
    # }
    rm models/7B/ggml-model-f16.gguf.{etag,lastModified}
  • Smaller changes:

    • Log about etag / modified time changes that cause re-downloads
    • Enable the defaulting of --hf-file to --model on server (as was done on main)
    • Mitigate risk of buffer overflows in headers handling
make clean && make -j LLAMA_CURL=1 main server

./main -p Test -n 100 -mu https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
# ...

./main -p Test -n 100 -hfr NousResearch/Meta-Llama-3-8B-Instruct-GGUF -hff Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
# ...

ls models/
# Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
# Meta-Llama-3-8B-Instruct-Q4_K_M.gguf.json
# Phi-3-mini-4k-instruct-q4.gguf
# Phi-3-mini-4k-instruct-q4.gguf.json
# ...

cat models/Phi-3-mini-4k-instruct-q4.gguf.json
# {
#     "url": "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"
#     "etag": "\"b83ce18f1e735d825aa3402db6dae311-145\"",
#     "lastModified": "Thu, 25 Apr 2024 21:26:15 GMT",
# }

TODO:

  • Provide simple bash / python snippet to migrate existing .etag / .lastModified files to JSON (or be backwards compatible)

@phymbert
Copy link
Collaborator

Great, please mind it will force people to download again files from the remote URL, kind of a breaking change.

@ochafik
Copy link
Collaborator Author

ochafik commented Apr 26, 2024

Great, please mind it will force people to download again files from the remote URL, kind of a breaking change.

@phymbert ah I forgot, indeed! I initially planned on being backwards compatible (I felt it had low long-term usefulness but happy to add this code back, it's just a few lines) but I thought it's easier to provide a code snippet for people to create the JSON file out of the etag & lastModified. Added this as TODO before I undraft this PR.

(also technically even without a migration snippet people can just use -m models/... to use their already downloaded model(s), but agree it's an unpleasant surprise)

common/common.cpp Outdated Show resolved Hide resolved
n_items - strlen(last_modified_prefix) - 2); // Remove CRLF
std::string header(buffer, n_items);
std::smatch match;
if (std::regex_match(header, match, std::regex("([^:]+): (.*)\r\n", std::regex_constants::multiline))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the regex will be compiled at each header ? do we really need a regex to parse http headers ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a regex to parse http headers ?

Agree std::regex may seem overkill but it's simpler & safer than C string manipulations.

In the previous code for instance, just realized there's at least one buffer overflow bug (cc/ @ggerganov FYI): this strncpy will write beyond the stack-allocated etag's LLAMA_CURL_MAX_HEADER_LENGTH (=256) bytes and into stack-allocated etag_path if the ETag header value length is > 256 bytes, possibly giving HuggingFace (or anyone else you download from) write access to the system (a fix would be to turn the last arg of strncpy to MIN(sizeof(etag) - 1, n_items - strlen(etag_prefix) - 2), but it would make the code a bit harder to read & maintain).

the regex will be compiled at each header ?

Good point! I had opted to be slightly wasteful in CPU cycles here as the lifecyle management of regexes is trickier in the C callback context (e.g. can't allocate outside & pass through lambda capture), and the easiest alternatives (static alloc inside the callback or globally) were a bit wasteful in memory as these regex aren't useful afterwards. Let's go for potential shorter startup time? (now using local static allocs).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could always pass a pointer to the compiled regex in the userdata, however this is not performance sensitive code at all, and it's preferable to keep the code simple and easier to maintain.

common/common.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Apr 26, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 440 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=10678.97ms p(95)=29221.78ms fails=, finish reason: stop=382 truncated=58
  • Prompt processing (pp): avg=113.15tk/s p(95)=512.71tk/s
  • Token generation (tg): avg=24.41tk/s p(95)=37.58tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=model-args commit=5598a6a87d45159baf7b842b99bf14812f2233ec

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 349.73, 349.73, 349.73, 349.73, 349.73, 438.56, 438.56, 438.56, 438.56, 438.56, 390.64, 390.64, 390.64, 390.64, 390.64, 411.3, 411.3, 411.3, 411.3, 411.3, 421.51, 421.51, 421.51, 421.51, 421.51, 471.37, 471.37, 471.37, 471.37, 471.37, 493.34, 493.34, 493.34, 493.34, 493.34, 498.95, 498.95, 498.95, 498.95, 498.95, 515.04, 515.04, 515.04, 515.04, 515.04, 530.2, 530.2, 530.2, 530.2, 530.2, 533.2, 533.2, 533.2, 533.2, 533.2, 544.61, 544.61, 544.61, 544.61, 544.61, 527.82, 527.82, 527.82, 527.82, 527.82, 521.35, 521.35, 521.35, 521.35, 521.35, 564.35, 564.35, 564.35, 564.35, 564.35, 573.4, 573.4, 573.4, 573.4, 573.4, 575.49, 575.49, 575.49, 575.49, 575.49, 587.03, 587.03, 587.03, 587.03, 587.03, 590.58, 590.58, 590.58, 590.58, 590.58, 590.25, 590.25, 590.25, 590.25, 590.25, 609.94, 609.94, 609.94, 609.94, 609.94, 610.03, 610.03, 610.03, 610.03, 610.03, 606.36, 606.36, 606.36, 606.36, 606.36, 614.78, 614.78, 614.78, 614.78, 614.78, 615.01, 615.01, 615.01, 615.01, 615.01, 618.57, 618.57, 618.57, 618.57, 618.57, 634.21, 634.21, 634.21, 634.21, 634.21, 633.22, 633.22, 633.22, 633.22, 633.22, 637.24, 637.24, 637.24, 637.24, 637.24, 637.74, 637.74, 637.74, 637.74, 637.74, 639.56, 639.56, 639.56, 639.56, 639.56, 633.62, 633.62, 633.62, 633.62, 633.62, 631.72, 631.72, 631.72, 631.72, 631.72, 626.85, 626.85, 626.85, 626.85, 626.85, 626.47, 626.47, 626.47, 626.47, 626.47, 628.55, 628.55, 628.55, 628.55, 628.55, 632.68, 632.68, 632.68, 632.68, 632.68, 632.6, 632.6, 632.6, 632.6, 632.6, 636.63, 636.63, 636.63, 636.63, 636.63, 642.95, 642.95, 642.95, 642.95, 642.95, 636.32, 636.32, 636.32, 636.32, 636.32, 642.43, 642.43, 642.43, 642.43, 642.43, 630.8, 630.8, 630.8, 630.8, 630.8, 630.72, 630.72, 630.72, 630.72, 630.72, 630.42, 630.42, 630.42, 630.42, 630.42, 631.47, 631.47, 631.47, 631.47, 631.47, 635.18, 635.18, 635.18, 635.18, 635.18, 637.81, 637.81, 637.81, 637.81, 637.81, 632.71, 632.71, 632.71, 632.71, 632.71, 630.95, 630.95, 630.95, 630.95, 630.95, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 630.69, 629.21, 629.21, 629.21, 629.21, 629.21, 628.83, 628.83, 628.83, 628.83, 628.83, 626.44, 626.44, 626.44, 626.44, 626.44, 624.97, 624.97, 624.97, 624.97, 624.97, 628.89, 628.89, 628.89, 628.89, 628.89, 631.73, 631.73, 631.73, 631.73, 631.73, 631.94, 631.94, 631.94, 631.94, 631.94, 631.98, 631.98, 631.98, 631.98]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.81, 34.81, 34.81, 34.81, 34.81, 35.65, 35.65, 35.65, 35.65, 35.65, 21.52, 21.52, 21.52, 21.52, 21.52, 21.59, 21.59, 21.59, 21.59, 21.59, 20.6, 20.6, 20.6, 20.6, 20.6, 20.97, 20.97, 20.97, 20.97, 20.97, 21.52, 21.52, 21.52, 21.52, 21.52, 22.6, 22.6, 22.6, 22.6, 22.6, 23.49, 23.49, 23.49, 23.49, 23.49, 23.75, 23.75, 23.75, 23.75, 23.75, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 23.82, 24.08, 24.08, 24.08, 24.08, 24.08, 24.07, 24.07, 24.07, 24.07, 24.07, 23.7, 23.7, 23.7, 23.7, 23.7, 23.74, 23.74, 23.74, 23.74, 23.74, 22.87, 22.87, 22.87, 22.87, 22.87, 22.95, 22.95, 22.95, 22.95, 22.95, 23.08, 23.08, 23.08, 23.08, 23.08, 23.3, 23.3, 23.3, 23.3, 23.3, 23.27, 23.27, 23.27, 23.27, 23.27, 23.0, 23.0, 23.0, 23.0, 23.0, 22.72, 22.72, 22.72, 22.72, 22.72, 22.7, 22.7, 22.7, 22.7, 22.7, 22.72, 22.72, 22.72, 22.72, 22.72, 22.88, 22.88, 22.88, 22.88, 22.88, 22.98, 22.98, 22.98, 22.98, 22.98, 22.79, 22.79, 22.79, 22.79, 22.79, 22.86, 22.86, 22.86, 22.86, 22.86, 23.03, 23.03, 23.03, 23.03, 23.03, 23.04, 23.04, 23.04, 23.04, 23.04, 23.08, 23.08, 23.08, 23.08, 23.08, 22.61, 22.61, 22.61, 22.61, 22.61, 22.46, 22.46, 22.46, 22.46, 22.46, 22.25, 22.25, 22.25, 22.25, 22.25, 22.41, 22.41, 22.41, 22.41, 22.41, 22.54, 22.54, 22.54, 22.54, 22.54, 22.56, 22.56, 22.56, 22.56, 22.56, 22.79, 22.79, 22.79, 22.79, 22.79, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.87, 22.83, 22.83, 22.83, 22.83, 22.83, 22.76, 22.76, 22.76, 22.76, 22.76, 22.55, 22.55, 22.55, 22.55, 22.55, 22.52, 22.52, 22.52, 22.52, 22.52, 22.63, 22.63, 22.63, 22.63, 22.63, 22.76, 22.76, 22.76, 22.76, 22.76, 22.86, 22.86, 22.86, 22.86, 22.86, 22.94, 22.94, 22.94, 22.94, 22.94, 22.81, 22.81, 22.81, 22.81, 22.81, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.73, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 22.08, 20.84, 20.84, 20.84, 20.84, 20.84, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.81, 20.83, 20.83, 20.83, 20.83, 20.83, 20.8, 20.8, 20.8, 20.8, 20.8, 20.84, 20.84, 20.84, 20.84]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.39, 0.39, 0.39, 0.39, 0.39, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.14, 0.14, 0.14, 0.14, 0.14, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.34, 0.34, 0.34, 0.34, 0.34, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.34, 0.34, 0.34, 0.34, 0.34, 0.37, 0.37, 0.37, 0.37, 0.37, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.33, 0.33, 0.33, 0.33, 0.33, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.09, 0.09, 0.09, 0.09, 0.09, 0.33, 0.33, 0.33, 0.33, 0.33, 0.46, 0.46, 0.46, 0.46, 0.46, 0.54, 0.54, 0.54, 0.54, 0.54, 0.6, 0.6, 0.6, 0.6, 0.6, 0.61, 0.61, 0.61, 0.61, 0.61, 0.62, 0.62, 0.62, 0.62, 0.62, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.11, 0.11, 0.11, 0.11]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 440 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714423821 --> 1714424447
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0]
                    
Loading

@ochafik ochafik marked this pull request as ready for review April 27, 2024 15:51
@ochafik ochafik merged commit 8843a98 into ggerganov:master Apr 29, 2024
63 checks passed
@@ -5,7 +5,7 @@ Feature: llama.cpp server
Background: Server startup
Given a server listening on localhost:8080
And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
And a model file ggml-model-f16.gguf
And a model file bert-bge-small.gguf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why this change is now necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this! I think there was another test case that was implicitly downloading another URL to ggml-model-f16.gguf, causing a collision

nopperl pushed a commit to nopperl/llama.cpp that referenced this pull request May 5, 2024
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf)

* args: main & server now call gpt_params_handle_model_default

* args: define DEFAULT_MODEL_PATH + update cli docs

* curl: check url of previous download (.json metadata w/ url, etag & lastModified)

* args: fix update to quantize-stats.cpp

* curl: support legacy .etag / .lastModified companion files

* curl: rm legacy .etag file support

* curl: reuse regex across headers callback calls

* curl: unique_ptr to manage lifecycle of curl & outfile

* curl: nit: no need for multiline regex flag

* curl: update failed test (model file collision) + gitignore *.gguf.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

-mu without -m is... tricky
3 participants