fix(FA): QKV not being casted to target_dtype for FA with dpo lora #35834

NanoCode012 · 2025-01-22T09:30:42Z

What does this PR do?

In DPO LORA, the QKV states aren't being casted to target_dtype raising RuntimeError: FlashAttention only support fp16 and bf16 data type.

This issue was created from the refactor in #35342

The prior code checks if query.dtype == torch.float32 and always casting it to fp16.
The current code checks if query.dtype == torch.float32, followed by fa_peft_integration_check check if value.dtype == torch.float32 before casting to target_dtype.

From my debugging,

TARGET DTYPE torch.bfloat16                                                                                                               
QUERY DTYPE torch.float32                                                                                                                 
KEY DTYPE torch.float32                                                                                                                   
VALUE DTYPE torch.bfloat16

Since value.dtype is bf16, the cast does not occur. This PR fixes the check to use the same query.dtype check.

How has this been tested?

I ran Axolotl CI, found the related commit to cause the error, tracked down this issue, and verified the fix solves it.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Cyrilvallez

winglian · 2025-01-22T13:38:57Z

src/transformers/modeling_flash_attention_utils.py

+    input_dtype = query.dtype
    if input_dtype == torch.float32:


Suggested change

input_dtype = query.dtype

if input_dtype == torch.float32:

if any([module.dtype == torch.float32 for module in [query, key, value]]):

might it be better to check all the modules? or is there a performance penalty in doing this?

might not be a huge loss, but does not feel great

ArthurZucker

super sorry if we broke this, waiting for some feedback but sounds good

ArthurZucker · 2025-01-23T10:21:17Z

src/transformers/modeling_flash_attention_utils.py

+    input_dtype = query.dtype
    if input_dtype == torch.float32:


might not be a huge loss, but does not feel great

Cyrilvallez

Hey @NanoCode012! Super sorry for the delay! LGTM to me, thanks for fixing! Indeed it was not matching with what we are doing in flash_attention.py 🤗 And until now we were always only checking the query so it should be fine as is, I also don't like checking them all so much! But happy to revisit if this happen to be an issue in the future!

fix(FA): QKV not being casted to target_dtype due to dtype check

055d299

winglian reviewed Jan 22, 2025

View reviewed changes

ArthurZucker reviewed Jan 23, 2025

View reviewed changes

Cyrilvallez approved these changes Jan 28, 2025

View reviewed changes

ArthurZucker merged commit 478c4f2 into huggingface:main Jan 28, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(FA): QKV not being casted to target_dtype for FA with dpo lora #35834

fix(FA): QKV not being casted to target_dtype for FA with dpo lora #35834

NanoCode012 commented Jan 22, 2025 •

edited

Loading

winglian Jan 22, 2025

ArthurZucker Jan 23, 2025

ArthurZucker left a comment

ArthurZucker Jan 23, 2025

Cyrilvallez left a comment

	input_dtype = query.dtype
	if input_dtype == torch.float32:
	if any([module.dtype == torch.float32 for module in [query, key, value]]):

fix(FA): QKV not being casted to target_dtype for FA with dpo lora #35834

fix(FA): QKV not being casted to target_dtype for FA with dpo lora #35834

Conversation

NanoCode012 commented Jan 22, 2025 • edited Loading

What does this PR do?

How has this been tested?

Before submitting

Who can review?

winglian Jan 22, 2025

Choose a reason for hiding this comment

ArthurZucker Jan 23, 2025

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jan 23, 2025

Choose a reason for hiding this comment

Cyrilvallez left a comment

Choose a reason for hiding this comment

NanoCode012 commented Jan 22, 2025 •

edited

Loading