Skip to content

Releases: jploski/llama.cpp

b3620

24 Aug 22:50
e11bd85
Compare
Choose a tag to compare
CPU/CUDA: Gemma 2 FlashAttention support (#8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

b3064

01 Jun 18:38
2ac95c9
Compare
Choose a tag to compare
SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, S…