Cuda iq opt 3 #196

Nexesenex · 2024-06-30T04:53:42Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (ggerganov#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <[email protected]> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <[email protected]>

ggml-ci

…S (cmake) (ggerganov#8140)

…8142)

* clip : suppress unused variable warnings This commit suppresses unused variable warnings for the variables e in the catch blocks. The motivation for this change is to suppress the warnings that are generated on Windows when using the MSVC compiler. The warnings are not displayed when using GCC because GCC will mark all catch parameters as used. Signed-off-by: Daniel Bevenius <[email protected]> * squash! clip : suppress unused variable warnings Remove e (/*e*/) instead instead of using GGML_UNUSED. --------- Signed-off-by: Daniel Bevenius <[email protected]>

…nov#8145) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"

* account for space prefix character * use find instead

Co-authored-by: kustaaya <[email protected]>

* Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B

* Delete examples/llama.android/llama/CMakeLists.txt ggerganov#8145 (comment) This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead. * Update CMakeLists.txt Pick local llama.cpp files instead of fetching content from git

* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow * refactored `llama_control_vector_load_one()` * allow multiple directions for same layer in same file * llama_control_vector_load_one() and llama_control_vector_load() now break on error * removed unnecessary ggml_free() call

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13) → 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Philip Taron <[email protected]>

* add chatml fallback for cpp `llama_chat_apply_template` * remove redundant code

* cmake : fix deprecated option names not working * remove LlAMA_OPENMP

* CI: fix release build (Ubuntu) PR ggerganov#8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <[email protected]>

…perties (ggerganov#8132) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example

* Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <[email protected]> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <[email protected]> Co-authored-by: slaren <[email protected]>

…rn escapes (ggerganov#8180) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md

* add --spm-infill option * support --spm-infill * support --spm-infill

…emplate_internal` (ggerganov#8172) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch

…ov#8189)

ggerganov and others added 30 commits June 26, 2024 18:33

readme : update API notes

a95631e

devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (ggerganov#8139)

0e814df

ggml-ci

authors : regen

4713bf3

sync : ggml

f2d48ff

make : fix missing -O3 (ggerganov#8143)

c7ab7b6

ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLA…

31ec399

…S (cmake) (ggerganov#8140)

ci : publish new docker images only when the files change (ggerganov#…

ae5d0f4

…8142)

scripts : fix filename sync

c70d117

Fix llama-android.cpp for error - "common/common.h not found" (ggerga…

ac14662

…nov#8145) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"

llama : fix CodeLlama FIM token checks (ggerganov#8144)

911e35b

* account for space prefix character * use find instead

Added support for Viking pre-tokenizer (ggerganov#8135)

f675b20

Co-authored-by: kustaaya <[email protected]>

CUDA: fix MMQ stream-k for --split-mode row (ggerganov#8167)

85a267d

Add Qwen2MoE 57B-A14B model identifier (ggerganov#8158)

6030c61

* Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B

Add chatml fallback for cpp llama_chat_apply_template (ggerganov#8160)

16791b8

* add chatml fallback for cpp `llama_chat_apply_template` * remove redundant code

cmake : fix deprecated option names not working (ggerganov#8171)

8172ee9

* cmake : fix deprecated option names not working * remove LlAMA_OPENMP

CI: fix release build (Ubuntu+Mac) (ggerganov#8170)

558f44b

* CI: fix release build (Ubuntu) PR ggerganov#8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <[email protected]>

json: update grammars/README w/ examples & note about additionalPro…

cb0b06a

…perties (ggerganov#8132) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example

Add missing items in makefile (ggerganov#8177)

a27aa50

json: restore default additionalProperties to false, fix some patte…

139cc62

…rn escapes (ggerganov#8180) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md

cmake : allow user to override default options (ggerganov#8178)

b851b3f

Add SPM infill support (ggerganov#8016)

38373cf

* add --spm-infill option * support --spm-infill * support --spm-infill

Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_t…

26a39bb

…emplate_internal` (ggerganov#8172) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch

json: attempt to skip slow tests when running under emulator (ggergan…

8748d8a

…ov#8189)

fix code typo in llama-cli (ggerganov#8198)

72272b8

CUDA: refactor and optimize IQ MMVQ

ec15f4d

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU testing examples python server ggml devops SYCL Vulkan build android Kompute script Apple Metal nix labels Jun 30, 2024

Nexesenex merged commit 24409ab into Nexesenex:MMVQ_refactot Jun 30, 2024
46 of 56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda iq opt 3 #196

Cuda iq opt 3 #196

Nexesenex commented Jun 30, 2024

Cuda iq opt 3 #196

Cuda iq opt 3 #196

Conversation

Nexesenex commented Jun 30, 2024