Releases · ggerganov/llama.cpp

08 Dec 12:11

43ed389

b4288

llama : use cmake for swift build (#10525)

* llama : use cmake for swift build

* swift : <> -> ""

* ci : remove make

* ci : disable ios build

* Revert "swift : <> -> """

This reverts commit d39ffd9556482b77d4ea5b118b453fc1c097a31d.

* ci : try fix ios build

* ci : cont

* ci : cont

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

08 Dec 08:43

github-actions

b4287

ecc93d0

b4287

vulkan: compile a test shader in cmake to check for coopmat2 support …

Assets 22

07 Dec 20:20

github-actions

b4285

3573fa8

b4285

server : (refactor) no more json in server_task input (#10691)

* server : (refactor) no more json in server_task input

* add test for slots endpoint

* add tests for /props and /slots

* remove task inf_type

* fix CI by adding safe_json_to_str

* add "model_path" to /props

* update readme

Assets 22

07 Dec 17:12

github-actions

b4284

d9c3ba2

b4284

ggml : disable iq4_nl interleave size 8 (#10709)

ggml-ci

Assets 22

07 Dec 16:50

github-actions

b4283

ce4a7b8

b4283

server : various fixes (#10704)

* server : various fixes

ggml-ci

* server : show curent seed in slot_params

ggml-ci

* fix /slots endpoint

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* server : reflect endpoint response changes in the readme

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 22

07 Dec 13:31

github-actions

b4282

19d8762

b4282

ggml : refactor online repacking (#10446)

* rename ggml-cpu-aarch64.c to .cpp

* reformat extra cpu backend.

- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack

- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"

- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64

* clang-format

* Clean Q4_0_N_M ref

Enable restrict on C++

* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack

* added/corrected control on tensor size for Q4 repacking.

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* add debug logs on repacks.

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

07 Dec 10:35

github-actions

b4281

c2a16c0

b4281

server : fix free of spec context and batch (#10651)

ggml-ci

Assets 22

07 Dec 10:05

github-actions

b4280

3df784b

b4280

Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processi…

Assets 22

07 Dec 08:33

github-actions

b4279

86a1934

b4279

metal : Extend how Llama.cpp locates metal resources (#10676)

* metal : Extend how Llama.cpp locates metal resources (#10675)

  * It searches the resource file in the directory where the current
    binary is located as well.
  * Resolves symbolic links.

Rationale:

When we plug this dependency into a Bazel build and run it in the
context of Bazel (e.g. testing):

  * the execution directory is often very different from where the files
    are located and no direct control over this (Bazel sandboxing),
  * the Bazel sandbox often use symbolic links to make files available.

With this patch, we can have the resource file added to the target,
can build and run tests in the context of Bazel.

* Update ggml/src/ggml-metal/ggml-metal.m

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-metal/ggml-metal.m

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

06 Dec 13:10

github-actions

b4276

f162d45

b4276

common : bring back --no-warmup to server (#10686)

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4288

b4287

b4285

b4284

b4283

b4282

b4281

b4280

b4279

b4276