b3284 #204

Nexesenex · 2024-07-02T17:45:17Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…rn escapes (#8180) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md

* add --spm-infill option * support --spm-infill * support --spm-infill

…emplate_internal` (#8172) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch

…tor to Gemma2 (#8197) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <[email protected]> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <[email protected]>

…x/suffix is set (#8203) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change

* align with rope.cu and move sycl-op to a single file

* Update README.md document BERT support * Update README.md

* nix : remove OpenCL remnants * minor : remove parentheses

Co-authored-by: Georgi Gerganov <[email protected]>

* Added gppm to Tool list in README * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>

* gemma2: add sliding window mask * fix data_swa uninitialized * better naming * add co-author Co-authored-by: Arlo Phoenix <[email protected]> * replace list with single tensor * update * llama : minor styling * convert : add sanity check for query_pre_attn_scalar * fix small typo in README --------- Co-authored-by: Arlo Phoenix <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix

* fix gemma2 tokenizer convert * remove scores * improve code, fix new line issue

* use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size

* fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16

* convert-hf : print output file name when completed This commit adds the output file name to the log message when the conversion is completed. The motivation for this change is that when `--outfile` option is not specified it migth not be obvious where the output file is written. With this change the output of running the script will be something like the following: ```console INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf. ``` Signed-off-by: Daniel Bevenius <[email protected]> * squash! convert-hf : print output file name when completed Updates the output of to support printing the directory if the output is split into multiple files. Also the output file name is now retrieved from the model_instance object. Signed-off-by: Daniel Bevenius <[email protected]> * squash! convert-hf : print output file name when completed Use parent attribute of Path object and string interpolation. Signed-off-by: Daniel Bevenius <[email protected]> * squash! convert-hf : print output file name when completed Use os.sep instead of hardcoding the path separator. Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>

* Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <[email protected]>

…itorconfig step of CI. (#8258)

… upgrade / migration confusion arising from #7809. (#8257)

ochafik and others added 25 commits June 28, 2024 09:26

json: restore default additionalProperties to false, fix some patte…

139cc62

…rn escapes (#8180) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md

cmake : allow user to override default options (#8178)

b851b3f

Add SPM infill support (#8016)

38373cf

* add --spm-infill option * support --spm-infill * support --spm-infill

Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_t…

26a39bb

…emplate_internal` (#8172) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch

json: attempt to skip slow tests when running under emulator (#8189)

8748d8a

fix code typo in llama-cli (#8198)

72272b8

Fix new line issue with chat template, disable template when in-prefi…

9ef0780

…x/suffix is set (#8203) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change

flake.lock: Update (#8218)

d0a7145

[SYCL] Update SYCL-Rope op and Refactor (#8157)

197fe6c

* align with rope.cu and move sycl-op to a single file

Document BERT support. (#8205)

694c59c

* Update README.md document BERT support * Update README.md

nix : remove OpenCL remnants (#8235)

257f8e4

* nix : remove OpenCL remnants * minor : remove parentheses

nix : enable curl (#8043)

3840b6f

Co-authored-by: Georgi Gerganov <[email protected]>

readme : update tool list (#8209)

0ddeff1

* Added gppm to Tool list in README * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>

readme: add Paddler to the list of projects (#8239)

dae57a1

CUDA: refactor and optimize IQ MMVQ (#8215)

cb5fad4

* CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix

Fix gemma2 tokenizer convert (#8244)

5fac350

* fix gemma2 tokenizer convert * remove scores * improve code, fix new line issue

[SYCL] Fix the sub group size of Intel (#8106)

d08c20e

* use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size

[SYCL] Fix win build conflict of math library (#8230)

a9f3b10

* fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16

cuda : update supports_op for matrix multiplication (#8245)

0e0590a

Add JAIS model(s) (#8118)

9689673

* Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <[email protected]>

Removes multiple newlines at the end of files that is breaking the ed…

07a3fc0

…itorconfig step of CI. (#8258)

Adding step to clean target to remove legacy binary names to reduce…

3e2618b

… upgrade / migration confusion arising from #7809. (#8257)

Nexesenex merged commit 8385690 into Nexesenex:marstream Jul 2, 2024
12 of 15 checks passed

github-actions bot added Nvidia GPU testing examples python labels Jul 2, 2024

github-actions bot added server ggml devops SYCL Vulkan build script Apple Metal nix labels Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3284 #204

b3284 #204

Nexesenex commented Jul 2, 2024

b3284 #204

b3284 #204

Conversation

Nexesenex commented Jul 2, 2024