Releases · ggerganov/llama.cpp

17 Dec 16:31

382bc7f

b4341

llama : add Falcon3 support (#10864)

Assets 23

17 Dec 06:30

github-actions

b4338

7b1ec53

b4338

vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10…

Assets 23

16 Dec 21:41

github-actions

b4337

160bc03

b4337

rwkv6: add wkv6 support for Vulkan backend (#10829)

* rwkv_wkv6 vulkan shader

* RWKV_WKV6 Vulkan op tests passed

Signed-off-by: Molly Sophia <[email protected]>

* Apply code format changes

Signed-off-by: Molly Sophia <[email protected]>

* add [[unroll]] and remove unnecessary conditions

* add uma support

* fix erros in EditorConfig Checker

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Molly Sophia <[email protected]>

Assets 23

15 Dec 17:41

github-actions

b4333

a097415

b4333

llama : add Deepseek MoE v1 & GigaChat models (#10827)

* Add deepseek v1 arch & gigachat template

* improve template code

* add readme

* delete comments

* remove comment

* fix format

* lint llama.cpp

* fix order of deepseek and deepseek2, move gigachat temlate to the end of func

* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need

* remove comments

* move deepseek above deepseek2

* change placement of gigachat chat template

Assets 23

15 Dec 12:51

github-actions

b4331

5478bbc

b4331

server: (UI) add syntax highlighting and latex math rendering (#10808)

* add code highlighting and math formatting

* code cleanup

* build public/index.html

* rebuild public/index.html

* fixed coding style

* fixed coding style

* style fixes

* highlight: smaller bundle size, fix light & dark theme

* remove katex

* add bundle size check

* add more languages

* add php

* reuse some langs

* use gzip

* Revert "remove katex"

This reverts commit c0e5046accd10be3f83018cffdc29a652849fc61.

* use better maintained @vscode/markdown-it-katex

* fix gzip non deterministic

* ability to add a demo conversation for dev

* fix latex rendering

* add comment

* latex codeblock as code

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 23

14 Dec 23:11

github-actions

b4329

89d604f

b4329

server: Fix `has_next_line` in JSON response (#10818)

* Update server JSON response.

* Add unit test to check `has_new_line` JSON response

* Remove `has_new_line` unit test changes.

* Address code review comment: type check for `has_new_line` in unit test

Assets 23

14 Dec 13:32

github-actions

b4327

ba1cb19

b4327

llama : add Qwen2VL support + multimodal RoPE (#10361)

* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 23

13 Dec 23:11

github-actions

b4326

56eea07

b4326

Removes spurious \r in output that causes logging in journalctl to tr…

Assets 23

13 Dec 21:03

github-actions

b4325

a76c56f

b4325

Introducing experimental OpenCL backend with support for Qualcomm Adr…

Assets 23

13 Dec 19:19

github-actions

b4324

c27ac67

b4324

Opt class for positional argument handling (#10508)

Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:

  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <[email protected]>

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4341

b4338

b4337

b4333

b4331

b4329

b4327

b4326

b4325

b4324