Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache: new Cache format in decoder-only models #31421
Cache: new Cache format in decoder-only models #31421
Changes from 7 commits
183cd66
4578bca
9505ca4
2ab28f3
5fe4e9e
09413c3
3c27604
350acc5
c0adf10
c18b177
582f289
3141a71
33d54b4
dd05e6b
cb878d5
0588791
a27b47c
1abcf30
00ed88c
6c3b3aa
fd5eeab
e233f29
356d578
c906670
08d9e6f
56c05b2
8510810
cebb55d
8fd9dd1
4b9ced1
aea219b
4991863
ec306a2
cf793b7
c92409c
c2b97e4
35b60de
d2fca9a
0933350
42349d4
45c3a1b
5f22616
f5af6a2
21b45c5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget
(if appropriate, on this and other models)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, forgot about
_supports_quantized_cache
flag. But I am not sure about the "static_cache" flag. Will it imply that models are fullgraph compilable, cause we didn't test itin any case I'll make a following PR to check fullgraph compile and if it works as-is add the tests in each modeling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
We should add a mixin test for fullgraph compilation when
_supports_static_cache = True
Moreover, after this PR, I think
_supports_static_cache
and_supports_cache_class
will mean the same thing -- another property to check and rectify (= remove_supports_static_cache
) in a follow-up PRThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will test and add a mixin test (one lightweight and one slow maybe) after this PR is merged. For now I added the flags and tested via running generation tests
Btw, GIT will be an exception which supports cache class but not static cache as it has some special attn mask preparation steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we could also just add the class to the set
_support_quantized_cache={""}
in cache_utils, we don't pollute this here, and we can directly get all classes that support quantized / static etc.-> better to auto build the doc, better in general to serparate cache stuff from modeling specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super ugly IMO.
I would really love for us to just no add this code that ended up being copy pasted everywhere because we were lazy to update when porting new models.
Let's use cache positions. And let's already deprecate the casting and legacy cache etc, to make sure we only do this for one revision!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wanted to add
cache_position
as part of StaticCache support but it can be added in this PR also. Wondering about deprecation cycle for Cache class, afaik there were no deprecation warning about that before so we would still have to keep the ugly hack and add a warning message?I'm pro of totally getting rid of old cache, if it doesn't break BC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see that Llama on main had its preparation modified already assuming that the inputs is always Cache object. Oke, then it makes sense to get rid of legacy_cache in forward also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was not done before but this PR IMO is a good time to do it! So keep this code, but add a warning saying, we are automatically converting your cache from tuple to dynamic cache class (SHould only be triggered outside generate because generate should already pass a cache class!)
Yep it makes sense but let's not be too brutal in case some people still use it! We give them one release until we totally remove it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oke, added cache position and deprecation warnings to all models from this PR. I'll add the same deprecation warning to all models that already support cache class in another PR. This one is ready for review, tests are passing on my end!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kinds of changes are from fix copies, and are not related at all to the PR. But let's leave it here as it's anyway related to code-consistency in the library