b2849 #117

Nexesenex · 2024-05-11T08:24:40Z

No description provided.

@hanishkvc

…7097) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.

The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <[email protected]>.

…#7200)

* metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci

* ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * ggml : fix assert message * vulkan : add dev notes * ggml : require mask when using ALiBi ggml-ci * convert : fix convert for refact models

* feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <[email protected]>

* fix: llama-3 ignore_merges * test: add test for llama-3 bpe ignore_merges * fix: set ignore_merges only for llama-3 * fix: test-tokenizer-1-bpe --ingore-merges detection * fix: copy to fix fallthrough * fix: change ignore_merges to bool * fix: add ignore merges tests to cmake * llama : alternative merge ignore logic --------- Co-authored-by: Haoxiang Fei <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces

hanishkvc and others added 12 commits May 10, 2024 20:21

Main+: optionally allow special tokens from user in interactive mode (#…

f89fe27

…7097) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.

llama : use n_vocab to differentiate between mistral 7B and llama3 8B (…

25c6e82

…#7200)

convert : print "ignore_merges" field

8c66024

metal : fix flash attention kernel requirements (#7169)

18e4376

* metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci

llama-bench : add pp+tg test type (#7199)

e849648

server: fix reported top tokens for temperature 0 (#7203)

5ae3426

server : free llama_batch on exit (#7212)

9886313

* [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces

convert : skip unaccessible HF repos (#7210)

3292733

Nexesenex merged commit 9999720 into Nexesenex:downstream May 11, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b2849 #117

b2849 #117

Nexesenex commented May 11, 2024

b2849 #117

b2849 #117

Conversation

Nexesenex commented May 11, 2024