Skip to content

Releases: ggml-org/llama.cpp

b4745

20 Feb 09:09
0d55958
Compare
Choose a tag to compare
run : add --chat-template-file (#11961)

Relates to: https://github.com/ggml-org/llama.cpp/issues/11178

Added --chat-template-file CLI option to llama-run. If specified, the file
will be read and the content passed for overwriting the chat template of
the model to common_chat_templates_from_model.

Signed-off-by: Michael Engel <[email protected]>

b4743

19 Feb 12:23
d07c621
Compare
Choose a tag to compare
common : add llama.vim preset for Qwen2.5 Coder (#11945)

This commit adds a preset for llama.vim to use the default Qwen 2.5
Coder models.

The motivation for this change is to make it easier to start a server
suitable to be used with the llama.vim plugin. For example, the server
can be started with a command like the following:
```console
$ llama.vim --fim-qwen-1.5b-default
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10932

b4742

19 Feb 12:14
abd4d0b
Compare
Choose a tag to compare
speculative : update default params (#11954)

* speculative : update default params

* speculative : do not discard the last drafted token

b4739

18 Feb 18:46
63e489c
Compare
Choose a tag to compare
tool-call: refactor common chat / tool-call api (+ tests / fixes) (#1…

b4738

18 Feb 14:00
63ac128
Compare
Choose a tag to compare
server : add TEI API format for /rerank endpoint (#11942)

* server : add TEI API format for /rerank endpoint

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* fix

* also gitignore examples/server/*.gz.hpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b4735

17 Feb 13:49
73e2ed3
Compare
Choose a tag to compare
CUDA: use async data loading for FlashAttention (#11894)

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <[email protected]>

b4734

17 Feb 11:54
f7b1116
Compare
Choose a tag to compare
update release requirements (#11897)

b4733

17 Feb 10:53
c4d29ba
Compare
Choose a tag to compare
server : fix divide-by-zero in metrics reporting (#11915)

b4732

17 Feb 07:26
2eea03d
Compare
Choose a tag to compare
vulkan: implement several ops relevant for ggml_opt (#11769)

* vulkan: support memset_tensor

* vulkan: support GGML_OP_SUM

* vulkan: implement GGML_OP_ARGMAX

* vulkan: implement GGML_OP_SUB

* vulkan: implement GGML_OP_COUNT_EQUAL

* vulkan: implement GGML_OP_OPT_STEP_ADAMW

* vulkan: fix check_results RWKV_WKV6 crash and memory leaks

* vulkan: implement GGML_OP_REPEAT_BACK

* tests: remove invalid test-backend-ops REPEAT_BACK tests

* vulkan: fix COUNT_EQUAL memset using a fillBuffer command

b4731

16 Feb 17:39
0f2bbe6
Compare
Choose a tag to compare
server : bump httplib to 0.19.0 (#11908)