[Usage]: KV Cache Evict Method #6

alpemreacar · 2024-11-18T22:05:15Z

Hi,

Thank you again for the great repo.

I would like to dive more into it. I am looking at the place where the actual eviction happens after sorting. I would like to understand if the code discards the same number of KVs per layer & head.

As far as I understand, sorting + eviction happens in L441. More specifically, the eviction happens in count_block_evictions which uses a C code. Am I following correctly here?

If so, I wonder if this is a uniform_evict as the number of KVs in each layer/head is the same. If not, could you guide on how to do it? Is uniform_evict part working?

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

IsaacRe · 2024-11-20T18:15:52Z

Hello,

There are 4 main parts of the eviction process:

Select which KVs to evict
Determine how to reorder KVs in physical cache so that all evicted KVs reside in blocks with no non-evicted KVs (so that those blocks can be safely freed)
Free blocks containing evicted KVs so that they can be re-allocated during the next scheduling iteration (this only modifies the BlockManager, physical cache remains unchanged)
Move KVs in physical cache following the reordering from step 2

CompressionScheduler._schedule_compression handles 1-3 of above steps, with CompressionMetrics.schedule_evictions handling step 1.

Step 4 is handled by the CacheEngine.execute_cache_moves and is called in the main engine loop here.

Before determining which reordering to apply in execute_cache_moves, we need to sort KVs along each head by their logical index, with any non-evicted KVs coming last. count_block_evictions is used to set the logical index of any non-evicted KVs to an arbitrarily large value (MAX_INT in the code) so that after sorting by logical index, the KVs follow this ordering.

IsaacRe · 2024-11-20T18:17:14Z

And you're correct that the purpose of uniform_evict is to fix eviction rates across all layers/heads, though this was really only used for debugging purposes and is not currently supported.

alpemreacar · 2024-11-21T15:54:31Z

Thank you so much for the explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: KV Cache Evict Method #6

[Usage]: KV Cache Evict Method #6

alpemreacar commented Nov 18, 2024 •

edited

Loading

IsaacRe commented Nov 20, 2024

IsaacRe commented Nov 20, 2024

alpemreacar commented Nov 21, 2024

[Usage]: KV Cache Evict Method #6

[Usage]: KV Cache Evict Method #6

Comments

alpemreacar commented Nov 18, 2024 • edited Loading

Before submitting a new issue...

IsaacRe commented Nov 20, 2024

IsaacRe commented Nov 20, 2024

alpemreacar commented Nov 21, 2024

alpemreacar commented Nov 18, 2024 •

edited

Loading