🚨All attention refactor🚨 #35235

ArthurZucker · 2024-12-12T13:39:35Z

What does this PR do?

Todo in this PR:

ArthurZucker · 2024-12-13T18:26:41Z

src/transformers/modeling_utils.py

+)
+
+
+class GradientCheckpointLayer(torch.nn.Module):


This should help with kwargs as well

Cyrilvallez · 2025-01-07T15:01:57Z

BTW, maybe it would be safer to check {p.device for p in model.parameters()} | {p.device for p in model.buffers()}?

BenjaminBossan · 2025-01-07T15:34:44Z

Thanks a lot @Cyrilvallez I did not notice that. Indeed, your change fixes the failure.

BTW, maybe it would be safer to check {p.device for p in model.parameters()} | {p.device for p in model.buffers()}?

Thanks for the suggestion. In the original unit test, this is just a sanity check, the proper testing comes further down, I just removed that part for clarity.

See huggingface/transformers#35235 (comment) for context. There has been a refactor in transformers that resulted in the rotary embedding of Mistral (and probably others) moving to the model level. This led to a device map used in one of the tests to being incorrect. This PR fixes the device map. Note that this fix doesn't really have anything to do with prefix tuning, the error occurred even before prefix tuning is used.

ArthurZucker · 2025-01-07T16:40:26Z

Yeah sorry @BenjaminBossan, it's just that we don't use that attribute so not sure we are gonna add it back! THe PR is breaking as the name indicates! But I hope it was not too much trouble!

BenjaminBossan · 2025-01-07T16:55:27Z

Thanks for letting me know, it should be an easy fix on the PEFT side.

The changes in huggingface/transformers#35235 resulted in a couple of adaption prompt tests to fail. This PR fixes these failures while maintaining compatibility with older transformers versions. Required changes: - hidden_size attribute removed from model, now config.hidden_size - num_heads attribute removed from model, now config.num_attention_heads - forward now returns 2 outputs instead of 3, rewritten to be agnostic towards the number of outputs

See huggingface/transformers#35235 (comment) for context. There has been a refactor in transformers that resulted in the rotary embedding of Mistral (and probably others) moving to the model level. This led to a device map used in one of the tests to being incorrect. This PR fixes the device map. Note that this fix doesn't really have anything to do with prefix tuning, the error occurred even before prefix tuning is used.

The changes in huggingface/transformers#35235 resulted in a couple of adaption prompt tests to fail. This PR fixes these failures while maintaining compatibility with older transformers versions. Required changes: - hidden_size attribute removed from model, now config.hidden_size - num_heads attribute removed from model, now config.num_attention_heads - forward now returns 2 outputs instead of 3, rewritten to be agnostic towards the number of outputs

foreverpiano · 2025-01-12T12:01:48Z

query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)

raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'MistralAttention' object has no attribute 'num_heads'

How can I fix this?

ArthurZucker · 2025-01-13T08:28:45Z

Hey! you should try to use the latest release of transformers! query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2) is what's used now.

ArthurZucker · 2025-01-13T08:29:06Z

Is this by any chance related to AWQ or another package?

foreverpiano · 2025-01-13T08:42:10Z

Is there any doc about how to migrate from previous version to this version, like the variable definition, the alias change?

foreverpiano · 2025-01-13T08:44:18Z

Have you tested on several benchmarks about the performance? I knew that the Longbench score on transformer v4.47 vs v4.36 varies a lot on llama-3. Is it stable on this version?
I suggest adding some simple and small dataset tests.

Cyrilvallez · 2025-01-13T09:34:52Z

Hey! Everything stays the same in terms of user experience/benchmark scores. If you used to hack into the different Layer classes however, it may have changed a bit. You can simply go and check-out the modeling code in this case (as was the case if you hacked into it in the first place I guess!)

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

poedator · 2025-01-14T17:16:05Z

My friends use a GPT2Model in production and want to compile it with StaticCache. With the maintainers blessing, I would try to create a PR with DynamicCache / StaticCache support in GPT2Model.
I am quite familiar with Cache class, I already coded some and made the DynamicCache work.

Please let me know if there are any hidden obstacles in Cache implementation for GPT2? Which tests to run or add?
@ArthurZucker

Rocketknight1 · 2025-01-15T13:20:05Z

cc @gante to that question!

gante · 2025-01-15T13:52:58Z

I've chatted to @poedator offline -- I couldn't think of any obstacle in particular, and suggested a) to ensure we leave a deprecation warning regarding the old cache format b) use RUN_SLOW=1 py.test tests/models/gpt2/test_modeling_gpt2.py as a correctness check (gpt2 is fairly well tested, especially wrt text generation)

poedator · 2025-01-18T02:07:43Z

It looks like test_flash_attn_2_from_config is broken - it expects attention layer to have flashattention in its name,
if "FlashAttention" in module.__class__.__name__:...
but after this refactoring, the attention classes are named differently.

ref

transformers/tests/test_modeling_common.py

Line 4641 in 5fa3534

if "FlashAttention" in module.__class__.__name__:

please fix or suspend the test.
@ArthurZucker

ArthurZucker · 2025-01-21T14:07:53Z

indeed gimme a min!

ArthurZucker force-pushed the all-attention-refactor branch from 0dc9253 to d1aa9ce Compare December 12, 2024 13:49

ArthurZucker commented Dec 13, 2024

View reviewed changes

ArthurZucker mentioned this pull request Dec 16, 2024

Add ModernBERT to Transformers #35158

Merged

ArthurZucker and others added 17 commits December 16, 2024 10:14

refactor LlamaAttention

79cb53c

minimal changes

4bb485b

fix llama

f370907

update

d3ef539

modular gemmas

45eac58

modular nits

e52af49

modular updates

5ed37ae

nits

38cafc1

simplify

a862eac

gpt2

5639b81

more modualr and fixes

452d8ed

granite

81a0b66

modular modular modular

bc72c3f

nits

48caa89

update

df68dd0

qwen2 + starcoder2

0325dc4

mostly gemma2

ecd814b

Cyrilvallez force-pushed the all-attention-refactor branch from 8b56823 to ecd814b Compare December 16, 2024 11:28

Cyrilvallez and others added 9 commits December 16, 2024 12:39

Update image_processing_auto.py

f5fc638

fix

5e56d9c

Update modular_starcoder2.py

598b7bb

fix

0f565fb

remove all copied from attentions

c9ac84d

remove gcv

d189fe7

make fix-copies

9c83d96

oups

138368e

oups2.0

7225a4f

SunMarc mentioned this pull request Jan 7, 2025

FA2 broken for Cohere2 if Optional Mask is not passed in forward #35547

Closed

4 tasks

BenjaminBossan mentioned this pull request Jan 7, 2025

FIX Prefix tuning test w/ rotary emb on multi GPU huggingface/peft#2311

Merged

molbap mentioned this pull request Jan 7, 2025

Add Molmo (7B-D, 7B-O, 70B) #33962

Open

BenjaminBossan mentioned this pull request Jan 8, 2025

FIX: Adaption prompt errors after changes from transformers #35235 huggingface/peft#2314

Merged

loadams mentioned this pull request Jan 9, 2025

Pin nv-a6000 workflow microsoft/DeepSpeed#6938

Merged

Tcc0403 mentioned this pull request Jan 11, 2025

NVIDIA CI failing due to transformers v4.48.0 refactor linkedin/Liger-Kernel#520

Closed

daskol mentioned this pull request Jan 12, 2025

New transformers v4.48.0 breaks nightly build nntile/nntile#204

Closed

zucchini-nlp mentioned this pull request Jan 13, 2025

ValueError: MllamaForConditionalGeneration does not support Flash Attention 2.0 yet #35634

Open

4 tasks

loadams added a commit to microsoft/DeepSpeed that referenced this pull request Jan 13, 2025

Pin nv-a6000 workflow (#6938)

66d3d3e

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

pcuenca mentioned this pull request Jan 14, 2025

Fix Gemma2 sliding window attention #35691

Closed

poedator mentioned this pull request Jan 18, 2025

GPT2Model StaticCache support #35761

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

ArthurZucker Dec 13, 2024

Cyrilvallez commented Jan 7, 2025

BenjaminBossan commented Jan 7, 2025

ArthurZucker commented Jan 7, 2025

BenjaminBossan commented Jan 7, 2025

foreverpiano commented Jan 12, 2025

ArthurZucker commented Jan 13, 2025

ArthurZucker commented Jan 13, 2025

foreverpiano commented Jan 13, 2025 •

edited

Loading

foreverpiano commented Jan 13, 2025 •

edited

Loading

Cyrilvallez commented Jan 13, 2025

poedator commented Jan 14, 2025 •

edited

Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 15, 2025

poedator commented Jan 18, 2025

ArthurZucker commented Jan 21, 2025

		)


		class GradientCheckpointLayer(torch.nn.Module):

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

Conversation

ArthurZucker commented Dec 12, 2024 • edited by Cyrilvallez Loading

What does this PR do?

ArthurZucker Dec 13, 2024

Choose a reason for hiding this comment

Cyrilvallez commented Jan 7, 2025

BenjaminBossan commented Jan 7, 2025

ArthurZucker commented Jan 7, 2025

BenjaminBossan commented Jan 7, 2025

foreverpiano commented Jan 12, 2025

ArthurZucker commented Jan 13, 2025

ArthurZucker commented Jan 13, 2025

foreverpiano commented Jan 13, 2025 • edited Loading

foreverpiano commented Jan 13, 2025 • edited Loading

Cyrilvallez commented Jan 13, 2025

poedator commented Jan 14, 2025 • edited Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 15, 2025

poedator commented Jan 18, 2025

ArthurZucker commented Jan 21, 2025

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

foreverpiano commented Jan 13, 2025 •

edited

Loading

foreverpiano commented Jan 13, 2025 •

edited

Loading

poedator commented Jan 14, 2025 •

edited

Loading