[Qwen2_5_VL] Fix dtype mismatch in FA2 by removing forced float() cast in rotary embeddings #35966

sockeye44 · 2025-01-29T21:31:33Z

What does this PR do?

This PR fixes a FlashAttention assertion error that occurs in the Qwen2_5_VL vision blocks when a query tensor is forcibly cast to float32, but the cos/sin embedding tensors remain in a lower precision (bf16 or fp16). This mismatch triggers the error during training runs on mixed (text-only + text-image) batches:

AssertionError: Input and cos/sin must have the same dtype, got torch.float32 and torch.bfloat16

Changes:

Removed forced .float() cast in apply_rotary_pos_emb_flashatt so that the query tensor and the cos/sin tensors share the same dtype:

- tensor_ = tensor.float()
+ tensor_ = tensor

Why:

FlashAttention requires all inputs to apply_rotary_emb to have identical dtypes
Removing the forced cast ensures that the Qwen2_5_VL vision path works correctly in mixed or lower-precision settings, preventing upstream assertion failures and allowing memory-efficient training in bf16 or fp16
Verified the fix with seeded runs in mixed precision (bf16 and fp16) settings; tested both text-only and mixed text-image batches to ensure correct behavior

… float() cast in rotary embeddings

sockeye44 · 2025-01-29T21:55:44Z

@amyeroberts Hi, could you please review this PR?

zucchini-nlp

Hey @sockeye44 ! Thanks for opening a PR. Can you share a small reproducer, I can't reproduce it with the following. I believe the cos/sin are usually in full precision when doing RoPE

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2.5-VL-7B-Instruct",
            torch_dtype=torch.bfloat16,
            attn_implementation="flash_attention_2",
            device_map="auto",
)
text = processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
 output = model.generate(**inputs, max_new_tokens=30)

[Qwen2_5_VL] Fix dtype mismatch in Flash Attention by removing forced…

7de99ae

… float() cast in rotary embeddings

zucchini-nlp reviewed Jan 30, 2025

View reviewed changes

zucchini-nlp mentioned this pull request Feb 5, 2025

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36046

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Qwen2_5_VL] Fix dtype mismatch in FA2 by removing forced float() cast in rotary embeddings #35966

[Qwen2_5_VL] Fix dtype mismatch in FA2 by removing forced float() cast in rotary embeddings #35966

sockeye44 commented Jan 29, 2025

sockeye44 commented Jan 29, 2025

zucchini-nlp left a comment

[Qwen2_5_VL] Fix dtype mismatch in FA2 by removing forced float() cast in rotary embeddings #35966

Are you sure you want to change the base?

[Qwen2_5_VL] Fix dtype mismatch in FA2 by removing forced float() cast in rotary embeddings #35966

Conversation

sockeye44 commented Jan 29, 2025

What does this PR do?

sockeye44 commented Jan 29, 2025

zucchini-nlp left a comment

Choose a reason for hiding this comment