[Bug]: KV Cache exploded #91

rakkit · 2024-12-14T06:16:46Z

Describe the bug

In the case of using softmax attention or any other attention with window_size=None, the KV cache update falls into this branch. This logic concatenates all historical sequence states with the new states (attn_state[0] and attn_state[1]), causing exponential growth in the KV cache.

Steps to reproduce the bug

Inference with attention with window_size=None

Expected behavior

KV-cache exploded

Environment info

The text was updated successfully, but these errors were encountered:

yzhangcs · 2024-12-16T17:19:39Z

@rakkit Hello, could you provide a minimal script for reproduction. I didn't meet the errors you reported by setting window_size=None

rakkit added the bug Something isn't working label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: KV Cache exploded #91

[Bug]: KV Cache exploded #91

rakkit commented Dec 14, 2024

yzhangcs commented Dec 16, 2024

[Bug]: KV Cache exploded #91

[Bug]: KV Cache exploded #91

Comments

rakkit commented Dec 14, 2024

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

yzhangcs commented Dec 16, 2024