Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4354
server : add "tokens" output (#10853) * server : add "tokens" output ggml-ci * server : update readme ggml-ci * server : return tokens ids only if requested ggml-ci * tests : improve "tokens" type check Co-authored-by: Xuan Son Nguyen <[email protected]> * server : remove "tokens" from the OAI endpoint ggml-ci --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b4353
server : (embeddings) using same format for "input" and "content" (#1…
b4351
Revert "llama : add Falcon3 support (#10864)" (#10876) This reverts commit 382bc7f2e8ffd0b89f23e840d097e21f301197ba.
b4350
Use model->gguf_kv for loading the template instead of using the C AP…
b4349
tests: add tests for GGUF (#10830)
b4348
sync : ggml
b4343
ggml : update ggml_backend_cpu_device_supports_op (#10867) * ggml : fix cpy op for IQ-quants to use reference impl ggml-ci * ggml : disable tests involving i-matrix quantization * ggml : update ggml_backend_cpu_device_supports_op ggml-ci
b4342
server : fill usage info in embeddings and rerank responses (#10852) * server : fill usage info in embeddings response * server : fill usage info in reranking response
b4341
llama : add Falcon3 support (#10864)
b4338
vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10…