Skip to content

Commit

Permalink
llama : work around F16 DMMV buffer overflow by increasing padding
Browse files Browse the repository at this point in the history
Upstream issue: ggerganov#8798
  • Loading branch information
cebtenzzre committed Jul 31, 2024
1 parent 2a4898a commit 7ea0fed
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions src/llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3464,8 +3464,7 @@ static void llama_kv_cache_defrag(struct llama_kv_cache & cache) {
}

static uint32_t llama_kv_cache_get_padding(const struct llama_cparams & cparams) {
// the FA kernels require padding to avoid extra runtime boundary checks
return cparams.flash_attn ? 256u : 32u;
return 256u;
}

//
Expand Down

0 comments on commit 7ea0fed

Please sign in to comment.