Releases · fairydreaming/llama.cpp

20 Feb 15:49

c5d91a7

b4747 Latest

Latest

ggml-cpu: Add CPU backend support for KleidiAI library (#11390)

* ggml-cpu: Add CPU backend support for KleidiAI library

* Add environmental variable GGML_KLEIDIAI_SME

* Add support for multithread LHS conversion

* Switch kernel selection order to dotprod and i8mm

* updates for review comments

* More updates for review comments

* Reorganize and rename KleidiAI files

* Move ggml-cpu-traits.h to source file

* Update cmake for SME build and add alignment for SME

* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list

Assets 24

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-02-20T15:49:39Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-02-20T15:49:47Z
llama-b4747-bin-macos-arm64.zip

23.3 MB 2025-02-20T15:49:57Z
llama-b4747-bin-macos-x64.zip

24.9 MB 2025-02-20T15:49:58Z
llama-b4747-bin-ubuntu-vulkan-x64.zip

30.7 MB 2025-02-20T15:49:59Z
llama-b4747-bin-ubuntu-x64.zip

26.9 MB 2025-02-20T15:50:00Z
llama-b4747-bin-win-avx-x64.zip

16.4 MB 2025-02-20T15:50:01Z
llama-b4747-bin-win-avx2-x64.zip

16.4 MB 2025-02-20T15:50:02Z
llama-b4747-bin-win-avx512-x64.zip

16.4 MB 2025-02-20T15:50:03Z
llama-b4747-bin-win-cuda-cu11.7-x64.zip

191 MB 2025-02-20T15:50:04Z
Source code (zip)

2025-02-20T13:06:51Z
Source code (tar.gz)

2025-02-20T13:06:51Z

06 Feb 19:34

github-actions

b4655

2fb3c32

b4655

server : (webui) migrate project to ReactJS with typescript (#11688)

* init version

* fix auto scroll

* bring back copy btn

* bring back thought process

* add lint and format check on CI

* remove lang from html tag

* allow multiple generations at the same time

* lint and format combined

* fix unused var

* improve MarkdownDisplay

* fix more latex

* fix code block cannot be selected while generating

Assets 23

01 Feb 10:11

github-actions

b4608

5bbc736

b4608

ci: simplify cmake build commands (#11548)

Assets 23

24 Jan 13:33

github-actions

b4543

8137b4b

b4543

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)

Assets 23

22 Jan 17:15

github-actions

b4528

c64d2be

b4528

`minja`: sync at https://github.com/google/minja/commit/0f5f7f2b3770e…

Assets 23

21 Jan 18:39

github-actions

b4524

6171c9d

b4524

Add Jinja template support (#11016)

* Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcbfbd4e71229736640322b31c7f9

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to https://github.com/google/minja/commit/b8437df626ac6cd0ce3b333b3c74ed1129c19f25

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 23

03 Jan 19:49

github-actions

b4410

4b0c638

b4410

common : disable KV cache shifting automatically for unsupported mode…

Assets 23

02 Jan 09:43

github-actions

b4405

a45433b

b4405

readme : add llama-swap to infrastructure section (#11032)

* list llama-swap under tools in README

* readme: add llama-swap to Infrastructure

Assets 23

13 Dec 18:36

github-actions

b4323

11e07fd

b4323

fix: graceful shutdown for Docker images (#10815)

Assets 22

03 Dec 20:55

github-actions

b4255

cc98896

b4255

vulkan: optimize and reenable split_k (#10637)

Use vector loads when possible in mul_mat_split_k_reduce. Use split_k
when there aren't enough workgroups to fill the shaders.

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: fairydreaming/llama.cpp

b4747

b4655

b4608

b4543

b4528

b4524

b4410

b4405

b4323

b4255