Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: KV Cache Evict Method #6

Open
1 task done
alpemreacar opened this issue Nov 18, 2024 · 3 comments
Open
1 task done

[Usage]: KV Cache Evict Method #6

alpemreacar opened this issue Nov 18, 2024 · 3 comments

Comments

@alpemreacar
Copy link

alpemreacar commented Nov 18, 2024

Hi,

Thank you again for the great repo.

I would like to dive more into it. I am looking at the place where the actual eviction happens after sorting. I would like to understand if the code discards the same number of KVs per layer & head.

As far as I understand, sorting + eviction happens in L441. More specifically, the eviction happens in count_block_evictions which uses a C code. Am I following correctly here?

If so, I wonder if this is a uniform_evict as the number of KVs in each layer/head is the same. If not, could you guide on how to do it? Is uniform_evict part working?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@IsaacRe
Copy link
Owner

IsaacRe commented Nov 20, 2024

Hello,

There are 4 main parts of the eviction process:

  1. Select which KVs to evict
  2. Determine how to reorder KVs in physical cache so that all evicted KVs reside in blocks with no non-evicted KVs (so that those blocks can be safely freed)
  3. Free blocks containing evicted KVs so that they can be re-allocated during the next scheduling iteration (this only modifies the BlockManager, physical cache remains unchanged)
  4. Move KVs in physical cache following the reordering from step 2

CompressionScheduler._schedule_compression handles 1-3 of above steps, with CompressionMetrics.schedule_evictions handling step 1.

Step 4 is handled by the CacheEngine.execute_cache_moves and is called in the main engine loop here.

Before determining which reordering to apply in execute_cache_moves, we need to sort KVs along each head by their logical index, with any non-evicted KVs coming last. count_block_evictions is used to set the logical index of any non-evicted KVs to an arbitrarily large value (MAX_INT in the code) so that after sorting by logical index, the KVs follow this ordering.

@IsaacRe
Copy link
Owner

IsaacRe commented Nov 20, 2024

And you're correct that the purpose of uniform_evict is to fix eviction rates across all layers/heads, though this was really only used for debugging purposes and is not currently supported.

@alpemreacar
Copy link
Author

Thank you so much for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants