Skip to content

Releases: ggerganov/llama.cpp

b4354

18 Dec 11:17
0e70ba6
Compare
Choose a tag to compare
server : add "tokens" output (#10853)

* server : add "tokens" output

ggml-ci

* server : update readme

ggml-ci

* server : return tokens ids only if requested

ggml-ci

* tests : improve "tokens" type check

Co-authored-by: Xuan Son Nguyen <[email protected]>

* server : remove "tokens" from the OAI endpoint

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b4353

18 Dec 11:17
4682887
Compare
Choose a tag to compare
server : (embeddings) using same format for "input" and "content" (#1…

b4351

18 Dec 01:15
4da69d1
Compare
Choose a tag to compare
Revert "llama : add Falcon3 support (#10864)" (#10876)

This reverts commit 382bc7f2e8ffd0b89f23e840d097e21f301197ba.

b4350

17 Dec 23:17
d62b532
Compare
Choose a tag to compare
Use model->gguf_kv for loading the template instead of using the C AP…

b4349

17 Dec 21:24
081b29b
Compare
Choose a tag to compare
tests: add tests for GGUF (#10830)

b4348

17 Dec 20:27
5437d4a
Compare
Choose a tag to compare
sync : ggml

b4343

17 Dec 20:19
0006f5a
Compare
Choose a tag to compare
ggml : update ggml_backend_cpu_device_supports_op (#10867)

* ggml : fix cpy op for IQ-quants to use reference impl

ggml-ci

* ggml : disable tests involving i-matrix quantization

* ggml : update ggml_backend_cpu_device_supports_op

ggml-ci

b4342

17 Dec 18:54
05c3a44
Compare
Choose a tag to compare
server : fill usage info in embeddings and rerank responses (#10852)

* server : fill usage info in embeddings response

* server : fill usage info in reranking response

b4341

17 Dec 16:31
382bc7f
Compare
Choose a tag to compare
llama : add Falcon3 support (#10864)

b4338

17 Dec 06:30
7b1ec53
Compare
Choose a tag to compare
vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10…