b2843 #110

Nexesenex · 2024-05-10T17:45:31Z

No description provided.

@hanishkvc

…7097) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.

The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <[email protected]>.

…#7200)

* metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci

I had forgotten that build_bitnet() does not use the standerd llm_build_ffn function, so the fused mul-silu didn't get used for Bitnet when I added it to llm_build_ffn. This gives us another ~1% speedup for TG-128. Co-authored-by: Iwan Kawrakow <[email protected]>

hanishkvc and others added 6 commits May 10, 2024 20:21

Main+: optionally allow special tokens from user in interactive mode (#…

f89fe27

…7097) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.

llama : use n_vocab to differentiate between mistral 7B and llama3 8B (…

25c6e82

…#7200)

convert : print "ignore_merges" field

8c66024

metal : fix flash attention kernel requirements (#7169)

18e4376

* metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci

llama-bench : add pp+tg test type (#7199)

e849648

Nexesenex merged commit 1462aa7 into Nexesenex:downstream May 10, 2024
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b2843 #110

b2843 #110

Nexesenex commented May 10, 2024

b2843 #110

b2843 #110

Conversation

Nexesenex commented May 10, 2024