Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b2843 #110

Merged
merged 6 commits into from
May 10, 2024
Merged

b2843 #110

merged 6 commits into from
May 10, 2024

Conversation

Nexesenex
Copy link
Owner

No description provided.

hanishkvc and others added 6 commits May 10, 2024 20:21
…7097)

@hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug

    ./llamafile -m foo.gguf -p bar --grammar 'root::="'

Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <[email protected]>.
* metal : fix flash attention kernel requirements

ggml-ci

* metal : fix ggml_metal_supports_op

ggml-ci
@Nexesenex Nexesenex merged commit 1462aa7 into Nexesenex:downstream May 10, 2024
4 of 7 checks passed
Nexesenex pushed a commit that referenced this pull request Dec 22, 2024
I had forgotten that build_bitnet() does not use the standerd
llm_build_ffn function, so the fused mul-silu didn't get used
for Bitnet when I added it to llm_build_ffn.

This gives us another ~1% speedup for TG-128.

Co-authored-by: Iwan Kawrakow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants