master b1794 #70

Nexesenex · 2024-01-08T23:19:51Z

No description provided.

* clip : refactor + bug fixes ggml-ci * server : add log message

ggml-ci

...and add a job for flakehub.com

to a commit recently cached by nixpkgs-cuda-ci

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci

Signed-off-by: Daniel Bevenius <[email protected]>

* update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci * fix: update torch version --------- Co-authored-by: Trần Đức Nam <[email protected]> Co-authored-by: Le Hoang Anh <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <[email protected]> Co-authored-by: Someone Serge <[email protected]>

* Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <[email protected]> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <[email protected]>

* replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`

* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <[email protected]>

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id

* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

Co-authored-by: slaren <[email protected]>

* updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov

* examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * make : add passkey target * passkey : add "self-extend"-like context extension (#4810) * llama : "self-extend"-like context extension * passkey : add comment * passkey : add readme

* examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * llama : "self-extend"-like context extension * passkey : add comment * main : add Self-Extend support * llama : add comment about llama_kv_cache_seq_div

@ggerganov

* iq2_xxs: basics * iq2_xxs: scalar and AVX2 dot products Needed to change Q8_K to have quants in the -127...127 range, else the IQ2_XXS AVX implementation becomes very awkward. The alternative would have been to use Q8_0 instead. Perhaps I'll change later, for now this is what we have. * iq2_xxs: ARM_NEON dot product Somehow strangely slow (112 ms/token). * iq2_xxs: WIP Metal Dequantize works, something is still wrong with the dot product. * iq2_xxs: Metal dot product now works We have PP-512 = 475 t/s TG-128 = 47.3 t/s Not the greatest performance, but not complete garbage either. * iq2_xxs: slighty faster dot product TG-128 is now 48.4 t/s * iq2_xxs: slighty faster dot product TG-128 is now 50.9 t/s * iq2_xxs: even faster Metal dot product TG-128 is now 54.1 t/s. Strangely enough, putting the signs lookup table into shared memory has a bigger impact than the grid values being in shared memory. * iq2_xxs: dequantize CUDA kernel - fix conflict with master * iq2_xxs: quantized CUDA dot product (MMVQ) We get TG-128 = 153.1 t/s * iq2_xxs: slightly faster CUDA dot product TG-128 is now at 155.1 t/s. * iq2_xxs: add to llama ftype enum * iq2_xxs: fix MoE on Metal * Fix missing MMQ ops when on hipBLAS I had put the ggml_supports_mmq call at the wrong place. * Fix bug in qequantize_row_iq2_xxs The 0.25f factor was missing. Great detective work by @ggerganov! * Fixing tests * PR suggestion --------- Co-authored-by: Iwan Kawrakow <[email protected]>

See https://github.com/ggerganov/llama.cpp/blob/master/common/common.cpp#L230C53-L230C57

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

This reverts commit 2637d2deebed514b45f39df95c88cd9b8f783324.

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

* Adding fused y*unary(x) op * Fused y*unary(x) op: CUDA * Fused y*unary(x) op: dedicated CPU implementation for silu and gelu * Fused y*unary(x) op: Metal --------- Co-authored-by: Iwan Kawrakow <[email protected]>

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

@ikawrakow

Credit : Iwan Kawrakow @ikawrakow

ggerganov and others added 30 commits December 30, 2023 23:24

clip : refactor + bug fixes (#4696)

9fbda71

* clip : refactor + bug fixes ggml-ci * server : add log message

ggml : add ggml_vdotq_s32 alias (#4715)

e39106c

ggml-ci

flake.nix: expose full scope in legacyPackages

1e3900e

flake.nix: rocm not yet supported on aarch64, so hide the output

a5c088d

flake.nix: expose checks

356ea17

workflows: nix-ci: init; build flake outputs

7adedec

workflows: nix-ci: add a job for eval

1e9ae54

workflows: weekly nix flake update

c523994

workflows: nix-flakestry: drop tag filters

06f2a5d

...and add a job for flakehub.com

workflows: nix-ci: add a qemu job for jetsons

d836174

flake.nix: suggest the binary caches

198ed7e

flake.lock: update

edd1ab7

to a commit recently cached by nixpkgs-cuda-ci

finetune: fix typo in README.md (#4733)

775ac87

Signed-off-by: Daniel Bevenius <[email protected]>

editorconfig : fix whitespace and indentation #4710

32866c5

llama : replace all API facing int's with int32_t (#4577)

0040d42

* replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`

llama : llama_model_desc print number of experts

540938f

server : add token counts to html footer (#4738)

0ef3ca2

* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <[email protected]>

server : throw an error when slot unavailable (#4741)

f2eb19b

ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)

5f66ebc

* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <[email protected]>

scripts : fix sync order + metal sed

ab62fc3

metal : add kernel_get_rows_i32

2893137

ggml-ci

sync : ggml

75e3fd8

ggml-ci

cuda : mark I16 and I32 ops as unsupported

d55356d

ggml-ci

cuda : simplify expression

7bed7eb

Co-authored-by: slaren <[email protected]>

swift : update Package.swift to use ggml as dependency (#4691)

ece9a45

* updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov

azarovalex and others added 13 commits January 7, 2024 10:20

llama.swiftui : use llama.cpp as SPM package (#4804)

72d8407

llama : remove redundant GQA check (#4796)

3c36213

llama : remove unused vars (#4796)

9dede37

CUDA: fixed redundant value dequantization (#4809)

d5a410e

llama-bench : add no-kv-offload parameter (#4812)

226460c

readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814)

b7e7982

llama.swiftui : update readme

42ea63c

swift : exclude ggml-metal.metal from the package (#4822)

668b31f

readme : add link to SOTA models

a9a8c5d

common : fix the short form of --grp-attn-w, not -gat (#4825)

1fc2f26

See https://github.com/ggerganov/llama.cpp/blob/master/common/common.cpp#L230C53-L230C57

Nexesenex merged commit e4705be into Nexesenex:master Jan 8, 2024
4 checks passed

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Fused unary(x)*y #70

2886a9e

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Fused unary(x)*y #70

681d0d1

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Fused unary(x)*y #70

4ac7ea8

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Fused unary(x)*y #70

1bb4aba

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Fused unary(x)*y #70

6c91fba

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Fused unary(x)*y #70

2bfae3f

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 15, 2024

Revert "Fused unary(x)*y #70"

53fd88c

This reverts commit 2637d2deebed514b45f39df95c88cd9b8f783324.

Nexesenex added a commit that referenced this pull request Dec 19, 2024

Fused unary(x)*y #70

34e29ce

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 20, 2024

Fused unary(x)*y #70

2cdb1ae

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 21, 2024

Fused unary(x)*y #70

2092f13

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 21, 2024

Fused unary(x)*y #70

dc5767b

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 23, 2024

Fused unary(x)*y #70

4042994

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 23, 2024

Fused unary(x)*y #70

4d2a464

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 24, 2024

Fused unary(x)*y #70

f8aca18

Credit : Iwan Kawrakow @ikawrakow

Nexesenex added a commit that referenced this pull request Dec 24, 2024

Fused unary(x)*y #70

2ad8e6e

Credit : Iwan Kawrakow @ikawrakow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master b1794 #70

master b1794 #70

Nexesenex commented Jan 8, 2024

master b1794 #70

master b1794 #70

Conversation

Nexesenex commented Jan 8, 2024