llama:use F32 precision in GLM4 attention and no FA #9130

piDack · 2024-08-22T08:56:55Z

There have been few reports of glm4 models generating "GGGG" when FA is disabled. This should it.
I used the following command line, and after modification, there was no GGG in output

#!/bin/bashit.
../build/bin/llama-cli -m "/mnt/edge/yhl/model/glm4-9b-chat-Q4_K_M.gguf" -c 8192 -ngl 41 -np 2 -p "hello how are you?"

before

after

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggerganov

I guess we should switch to F32 precision by default for the KQ multiplication

piDack · 2024-08-23T09:40:21Z

I guess we should switch to F32 precision by default for the KQ multiplication

Good Idea

ThiloteE · 2024-09-16T11:13:45Z

I guess this could be classified as a follow-up to #8031

fix glm GGG err

8a6ba03

piDack changed the title ~~fix glm GGG err~~ Fix glm4 GGG err Aug 22, 2024

piDack changed the title ~~Fix glm4 GGG err~~ llama:use F32 precision in GLM4 attention and no FA Aug 22, 2024

ggerganov approved these changes Aug 23, 2024

View reviewed changes

ggerganov merged commit a07c32e into ggerganov:master Aug 23, 2024
52 checks passed

cosmic-snow mentioned this pull request Aug 25, 2024

[Request] Add LongWriter model(s) nomic-ai/gpt4all#2883

Open

wszgrcy mentioned this pull request Sep 2, 2024

运行glm4-9b模型,对话时间久了会偶发性的回复GGGGGGG ollama/ollama#6250

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

llama : use F32 precision in GLM4 attention and no FA (ggerganov#9130)

5ff6c1e

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

llama : use F32 precision in GLM4 attention and no FA (ggerganov#9130)

c68b570

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama:use F32 precision in GLM4 attention and no FA #9130

llama:use F32 precision in GLM4 attention and no FA #9130

piDack commented Aug 22, 2024 •

edited

Loading

ggerganov left a comment

piDack commented Aug 23, 2024

ThiloteE commented Sep 16, 2024

llama:use F32 precision in GLM4 attention and no FA #9130

llama:use F32 precision in GLM4 attention and no FA #9130

Conversation

piDack commented Aug 22, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

piDack commented Aug 23, 2024

ThiloteE commented Sep 16, 2024

piDack commented Aug 22, 2024 •

edited

Loading