Releases · ggerganov/llama.cpp

18 Dec 11:17

0e70ba6

b4354 Latest

Latest

server : add "tokens" output (#10853)

* server : add "tokens" output

ggml-ci

* server : update readme

ggml-ci

* server : return tokens ids only if requested

ggml-ci

* tests : improve "tokens" type check

Co-authored-by: Xuan Son Nguyen <[email protected]>

* server : remove "tokens" from the OAI endpoint

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-18T11:17:07Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-18T11:17:13Z
llama-b4354-bin-macos-arm64.zip

57 MB 2024-12-18T11:17:21Z
llama-b4354-bin-macos-x64.zip

58.6 MB 2024-12-18T11:17:23Z
llama-b4354-bin-ubuntu-x64.zip

65.2 MB 2024-12-18T11:17:25Z
llama-b4354-bin-win-avx-x64.zip

9.46 MB 2024-12-18T11:17:26Z
llama-b4354-bin-win-avx2-x64.zip

9.47 MB 2024-12-18T11:17:27Z
llama-b4354-bin-win-avx512-x64.zip

9.48 MB 2024-12-18T11:17:28Z
llama-b4354-bin-win-cuda-cu11.7-x64.zip

147 MB 2024-12-18T11:17:29Z
llama-b4354-bin-win-cuda-cu12.4-x64.zip

146 MB 2024-12-18T11:17:32Z
Source code (zip)

2024-12-18T09:05:29Z
Source code (tar.gz)

2024-12-18T09:05:29Z

18 Dec 11:17

github-actions

b4353

4682887

b4353

server : (embeddings) using same format for "input" and "content" (#1…

Assets 23

18 Dec 01:15

github-actions

b4351

4da69d1

b4351

Revert "llama : add Falcon3 support (#10864)" (#10876)

This reverts commit 382bc7f2e8ffd0b89f23e840d097e21f301197ba.

Assets 23

17 Dec 23:17

github-actions

b4350

d62b532

b4350

Use model->gguf_kv for loading the template instead of using the C AP…

Assets 23

17 Dec 21:24

github-actions

b4349

081b29b

b4349

tests: add tests for GGUF (#10830)

Assets 23

17 Dec 20:27

github-actions

b4348

5437d4a

b4348

sync : ggml

Assets 23

17 Dec 20:19

github-actions

b4343

0006f5a

b4343

ggml : update ggml_backend_cpu_device_supports_op (#10867)

* ggml : fix cpy op for IQ-quants to use reference impl

ggml-ci

* ggml : disable tests involving i-matrix quantization

* ggml : update ggml_backend_cpu_device_supports_op

ggml-ci

Assets 23

17 Dec 18:54

github-actions

b4342

05c3a44

b4342

server : fill usage info in embeddings and rerank responses (#10852)

* server : fill usage info in embeddings response

* server : fill usage info in reranking response

Assets 23

17 Dec 16:31

github-actions

b4341

382bc7f

b4341

llama : add Falcon3 support (#10864)

Assets 23

17 Dec 06:30

github-actions

b4338

7b1ec53

b4338

vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10…

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4354

b4353

b4351

b4350

b4349

b4348

b4343

b4342

b4341

b4338