Releases · agray3/llama.cpp

06 Mar 07:27

57b6abf

b4836 Latest

Latest

android : fix KV cache log message condition (#12212)

Assets 26

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-03-06T07:27:46Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-03-06T07:27:55Z
llama-b4836-bin-macos-arm64.zip

23.8 MB 2025-03-06T07:28:05Z
llama-b4836-bin-macos-x64.zip

25.4 MB 2025-03-06T07:28:06Z
llama-b4836-bin-ubuntu-arm64.zip

26 MB 2025-03-06T07:28:07Z
llama-b4836-bin-ubuntu-vulkan-x64.zip

32 MB 2025-03-06T07:28:09Z
llama-b4836-bin-ubuntu-x64.zip

27.6 MB 2025-03-06T07:28:10Z
llama-b4836-bin-win-avx-x64.zip

16.6 MB 2025-03-06T07:28:11Z
llama-b4836-bin-win-avx2-x64.zip

16.6 MB 2025-03-06T07:28:12Z
llama-b4836-bin-win-avx512-x64.zip

16.6 MB 2025-03-06T07:28:14Z
Source code (zip)

2025-03-06T06:22:49Z
Source code (tar.gz)

2025-03-06T06:22:49Z

05 Mar 14:13

github-actions

b4829

074c4fd

b4829

ci : add fetch-depth to xcframework upload (#12195)

This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.

This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.

Assets 26

14 Nov 11:33

github-actions

b4078

2a82891

b4078

speculative : fix out-of-bounds access (#10289)

Assets 22

07 Nov 12:59

github-actions

b4041

2319126

b4041

fix q4_0_8_8 format for corrupted tokens issue (#10198)

Co-authored-by: EC2 Default User <[email protected]>

Assets 22

23 Oct 07:18

github-actions

b3963

873279b

b3963

flake.lock: Update

Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/5633bcff0c6162b9e4b5f1264264611e950c8ec7?narHash=sha256-9UTxR8eukdg%2BXZeHgxW5hQA9fIKHsKCdOIUycTryeVw%3D' (2024-10-09)
  → 'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)

Assets 22

11 Oct 09:24

github-actions

b3906

7eee341

b3906

common : use common_ prefix for common library functions (#9805)

* common : use common_ prefix for common library functions

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

09 Oct 16:45

github-actions

b3901

e702206

b3901

perplexity : fix integer overflow (#9783)

* perplexity : fix integer overflow

ggml-ci

* perplexity : keep n_vocab as int and make appropriate casts

ggml-ci

Assets 22

17 Sep 07:57

github-actions

b3774

0d2ec43

b3774

llama : support IBM Granite architecture (#9412)

* feat(gguf-py): Add Granite model and params to gguf-py

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* feat(convert_hf_to_gguf): Add registration and param setup for Granite

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* feat(llama.cpp): Add config parsing for Granite multiplier params

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* feat(llama.cpp): First pass at full port of granite deviations from llama

Something is still not working right since the results are mostly terrible,
but on occasion it's producing relevant results at this point, so
_something_ is working.

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* fix(llama.cpp): Determine granite language 3b instruct by vocab size

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* fix(convert_hf_to_gguf): Use LlamaModel as base for GraniteModel

The defaults in LlamaModel are needed for Granite as well

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* fix(llama.cpp): Switch Granite param names to use _scale for consistency

Other scalar multipliers are called *_scale, so this provides a more
consistent naming convention.

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* fix(convert_hf_to_gguf/gguf-py): _multiplier -> _scale

The transformers names with _multiplier will now be converted to the _scale
equivalent during conversion.

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

* fix(llama.cpp): Use separate switch clause for granite in llm_load_hparams

Branch: GraniteLM

Signed-off-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: Gabe Goodhart <[email protected]>

Assets 19

12 Aug 17:02

github-actions

b3577

0fd93cd

b3577

llama : model-based max number of graph nodes calculation (#8970)

* llama : model-based max number of graph nodes calculation

* Update src/llama.cpp

---------

Co-authored-by: slaren <[email protected]>

Assets 20

08 Aug 14:38

github-actions

b3549

afd27f0

b3549

scripts : sync cann files (#0)

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: agray3/llama.cpp

b4836

b4829

b4078

b4041

b3963

b3906

b3901

b3774

b3577

b3549