Compilade/fix mpt pretok #231

Nexesenex · 2024-07-11T00:56:54Z

No description provided.

Only used in _set_vocab_gpt2() for now.

This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces, but it unfortunately requires model re-conversion. There seems to be a weird behavior of the HF tokenizer for Gemma, which prefers to use the 16-space token over more lengthy space tokens, while using the SentencePiece tokenizer does not do this. (the implementation in llama.cpp has the same behavior as SentencePiece) * llama : fix wrong pre-tokenization of byte tokens

The order was previously wrong, which caused errors in some tests.

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment

* Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames. * Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

…8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

* update internlm2 * remove unused file * fix lint

Co-authored-by: Stanisław Szymczyk <[email protected]>

* Upd gguf-py/readme * Bump patch version for release

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

ggml-ci

#8404) * Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions. * Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary.

This makes the changes from #8321 more consistent with the other changes made here.

compilade and others added 28 commits June 30, 2024 14:34

llama : fix mpt and olmo pre-tokenizer

db2ffd5

Merge branch 'master' into compilade/fix-mpt-pretok

ac0f33c

llama : pre-tokenize non-special user-defined tokens first

d5d30b2

Merge branch 'master' into compilade/fix-mpt-pretok

6b961e3

llama : fix detection of control-like user-defined tokens

56df1fc

convert_hf : identify which user-defined tokens are control tokens

6e351e0

Only used in _set_vocab_gpt2() for now.

llama : fix Viking pre-tokenizer regex

31a1b0e

The order was previously wrong, which caused errors in some tests.

llama : fix command-r detokenization

d6fe269

convert_hf : reduce usages of the UNKNOWN token type

d4df785

llama : add UNKNOWN tokens in the special tokens cache

98edea6

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)

5b0b8d8

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)

a03e8dd

Update README.md to fix broken link to docs (#8399)

fd560fe

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

Server: Enable setting default sampling parameters via command-line (#…

a59f8fd

…8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

py : fix extra space in convert_hf_to_gguf.py (#8407)

8f0fad4

py : fix converter for internlm2 (#8321)

e4dd31f

* update internlm2 * remove unused file * fix lint

llama : add assert about missing llama_encode() call (#8400)

a8be1e6

Co-authored-by: Stanisław Szymczyk <[email protected]>

msvc : silence codecvt c++17 deprecation warnings (#8395)

7a80710

llama : C++20 compatibility for u8 strings (#8408)

cc61948

gguf-py rel pipeline (#8410)

83321c6

* Upd gguf-py/readme * Bump patch version for release

ggml : move sgemm sources to llamafile subfolder (#8394)

6b2a849

ggml-ci

[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)

f4444d9

Merge branch 'master' into compilade/fix-mpt-pretok

afa6119

convert_hf : reduce usages of UNKNOWN for InternLM2

1caa20f

This makes the changes from #8321 more consistent with the other changes made here.

Nexesenex merged commit f950d48 into Nexesenex:mptolmo Jul 11, 2024
5 of 8 checks passed

github-actions bot added the documentation Improvements or additions to documentation label Jul 11, 2024

github-actions bot added testing examples python server ggml SYCL build labels Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilade/fix mpt pretok #231

Compilade/fix mpt pretok #231

Nexesenex commented Jul 11, 2024

Compilade/fix mpt pretok #231

Compilade/fix mpt pretok #231

Conversation

Nexesenex commented Jul 11, 2024