Cleaner attention interfaces #35342

Cyrilvallez · 2024-12-19T13:16:26Z

What does this PR do?

Improve the attention interfaces with comments/dtype fix

HuggingFaceDocBuilderDev · 2024-12-19T13:44:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Cool thanks for updating, thanks @vasqu for the review

ArthurZucker · 2024-12-19T15:59:07Z

src/transformers/integrations/sdpa_attention.py

+    # SDPA with memory-efficient backend is bugged with non-contiguous inputs and custom attn_mask for some torch versions
+    # Reference: https://github.com/pytorch/pytorch/issues/112577.
+    if _needs_contiguous_inputs:
+        query = query.contiguous()
+        key = key.contiguous()
+        value = value.contiguous()


we could also just always apply contiguous!

Agree, calling contiguous doesn't affect performance much either way so the comment clarification suffices imo

vasqu

Just as a heads up, maybe it would be nice to doc what attn type supports what feature, e.g. sdpa and fa cannot output attn weights or use head mask while flex attn cannot use dropout.

Which brings me to the point to ask if these things have been considered (flex attn dropout not supported for example).

src/transformers/integrations/flash_attention.py

vasqu · 2024-12-19T16:58:58Z

src/transformers/integrations/sdpa_attention.py

+    # SDPA with memory-efficient backend is bugged with non-contiguous inputs and custom attn_mask for some torch versions
+    # Reference: https://github.com/pytorch/pytorch/issues/112577.
+    if _needs_contiguous_inputs:
+        query = query.contiguous()
+        key = key.contiguous()
+        value = value.contiguous()


Agree, calling contiguous doesn't affect performance much either way so the comment clarification suffices imo

ArthurZucker · 2024-12-20T11:08:44Z

LGTM

cleaner attention interfaces

f74a08e

Cyrilvallez mentioned this pull request Dec 19, 2024

🚨All attention refactor🚨 #35235

Merged

30 tasks

Cyrilvallez requested a review from ArthurZucker December 19, 2024 13:36

correctly set the _attn_implementation when adding other functions to it

6e5aac8

ArthurZucker approved these changes Dec 19, 2024

View reviewed changes

vasqu reviewed Dec 19, 2024

View reviewed changes

Cyrilvallez added 3 commits December 20, 2024 11:03

update

581d4e8

Update modeling_utils.py

42ccb53

CIs

392b71c

Cyrilvallez merged commit 0d51d65 into main Dec 20, 2024
26 checks passed

Cyrilvallez deleted the attention-interface branch December 20, 2024 11:09

poedator mentioned this pull request Dec 21, 2024

is_causal arg appears twice in FAttention call from GPT2Attention.forward() #35380

Closed

4 tasks

NanoCode012 mentioned this pull request Jan 22, 2025

fix(FA): QKV not being casted to target_dtype for FA with dpo lora #35834

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaner attention interfaces #35342

Cleaner attention interfaces #35342

Cyrilvallez commented Dec 19, 2024

HuggingFaceDocBuilderDev commented Dec 19, 2024

ArthurZucker left a comment

ArthurZucker Dec 19, 2024

vasqu Dec 19, 2024

vasqu left a comment

vasqu Dec 19, 2024

ArthurZucker commented Dec 20, 2024

Cleaner attention interfaces #35342

Cleaner attention interfaces #35342

Conversation

Cyrilvallez commented Dec 19, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Dec 19, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Dec 19, 2024

Choose a reason for hiding this comment

vasqu Dec 19, 2024

Choose a reason for hiding this comment

vasqu left a comment

Choose a reason for hiding this comment

vasqu Dec 19, 2024

Choose a reason for hiding this comment

ArthurZucker commented Dec 20, 2024