-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨All attention refactor🚨 #35235
🚨All attention refactor🚨 #35235
Conversation
0dc9253
to
d1aa9ce
Compare
src/transformers/modeling_utils.py
Outdated
) | ||
|
||
|
||
class GradientCheckpointLayer(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should help with kwargs as well
8b56823
to
ecd814b
Compare
BTW, maybe it would be safer to check |
Thanks a lot @Cyrilvallez I did not notice that. Indeed, your change fixes the failure.
Thanks for the suggestion. In the original unit test, this is just a sanity check, the proper testing comes further down, I just removed that part for clarity. |
See huggingface/transformers#35235 (comment) for context. There has been a refactor in transformers that resulted in the rotary embedding of Mistral (and probably others) moving to the model level. This led to a device map used in one of the tests to being incorrect. This PR fixes the device map. Note that this fix doesn't really have anything to do with prefix tuning, the error occurred even before prefix tuning is used.
Yeah sorry @BenjaminBossan, it's just that we don't use that attribute so not sure we are gonna add it back! THe PR is breaking as the name indicates! But I hope it was not too much trouble! |
Thanks for letting me know, it should be an easy fix on the PEFT side. |
The changes in huggingface/transformers#35235 resulted in a couple of adaption prompt tests to fail. This PR fixes these failures while maintaining compatibility with older transformers versions. Required changes: - hidden_size attribute removed from model, now config.hidden_size - num_heads attribute removed from model, now config.num_attention_heads - forward now returns 2 outputs instead of 3, rewritten to be agnostic towards the number of outputs
See huggingface/transformers#35235 (comment) for context. There has been a refactor in transformers that resulted in the rotary embedding of Mistral (and probably others) moving to the model level. This led to a device map used in one of the tests to being incorrect. This PR fixes the device map. Note that this fix doesn't really have anything to do with prefix tuning, the error occurred even before prefix tuning is used.
The changes in huggingface/transformers#35235 resulted in a couple of adaption prompt tests to fail. This PR fixes these failures while maintaining compatibility with older transformers versions. Required changes: - hidden_size attribute removed from model, now config.hidden_size - num_heads attribute removed from model, now config.num_attention_heads - forward now returns 2 outputs instead of 3, rewritten to be agnostic towards the number of outputs
query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
AttributeError: 'MistralAttention' object has no attribute 'num_heads' How can I fix this? |
Hey! you should try to use the latest release of |
Is this by any chance related to |
Is there any doc about how to migrate from previous version to this version, like the variable definition, the alias change? |
Have you tested on several benchmarks about the performance? I knew that the Longbench score on transformer v4.47 vs v4.36 varies a lot on llama-3. Is it stable on this version? |
Hey! Everything stays the same in terms of user experience/benchmark scores. If you used to hack into the different Layer classes however, it may have changed a bit. You can simply go and check-out the modeling code in this case (as was the case if you hacked into it in the first place I guess!) |
Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.
My friends use a Please let me know if there are any hidden obstacles in Cache implementation for GPT2? Which tests to run or add? |
cc @gante to that question! |
I've chatted to @poedator offline -- I couldn't think of any obstacle in particular, and suggested a) to ensure we leave a deprecation warning regarding the old cache format b) use |
It looks like ref transformers/tests/test_modeling_common.py Line 4641 in 5fa3534
please fix or suspend the test. |
indeed gimme a min! |
What does this PR do?
Todo in this PR: