Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: KV Cache exploded #91

Open
rakkit opened this issue Dec 14, 2024 · 1 comment
Open

[Bug]: KV Cache exploded #91

rakkit opened this issue Dec 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@rakkit
Copy link

rakkit commented Dec 14, 2024

Describe the bug

In the case of using softmax attention or any other attention with window_size=None, the KV cache update falls into this branch. This logic concatenates all historical sequence states with the new states (attn_state[0] and attn_state[1]), causing exponential growth in the KV cache.

Steps to reproduce the bug

Inference with attention with window_size=None

Expected behavior

KV-cache exploded

Environment info

@rakkit rakkit added the bug Something isn't working label Dec 14, 2024
@yzhangcs
Copy link
Member

@rakkit Hello, could you provide a minimal script for reproduction. I didn't meet the errors you reported by setting window_size=None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants