Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36046

DeepWaved · 2025-02-05T11:32:15Z

What does this PR do?

This PR addresses an issue that arises when using Flash Attention 2 in conjunction with other mixed-precision training frameworks like DeepSpeed, leading to an AssertionError:
"Input and cos/sin must have the same dtype, got torch.float32 and torch.float16."
The root cause is that the DeepSpeed framework implicitly converts inv_freq (L:118) to float16, while the rope computation process in both SDPA and the default case—specifically in the apply_rotary_pos_emb_vision function—involves converting it to float32. To prevent this bug from occurring, a similar conversion process has been added to apply_rotary_pos_emb_flashatt, ensuring consistent dtype handling and resolving the issue.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Test

import deepspeed
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

def train():
    MODEL_PATH = "Qwen2-5-VL-3B-Instruct"
    
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
        MODEL_PATH,
        torch_dtype="auto",
        device_map="cuda:0",
        attn_implementation="flash_attention_2",
    )
    processor = AutoProcessor.from_pretrained(MODEL_PATH)

    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
                    "max_pixels":128*128,
                },
                {"type": "text", "text": "Describe this image."},
            ],
        }
    ]
    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, video_inputs = process_vision_info(messages)
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt",
    ).to("cuda")

    model_engine, optimizer, _, _ = deepspeed.initialize(
        args=None,
        model=model,
        model_parameters=model.parameters(),
        config_params="ds.json",
    )
    model_engine.train()
    outputs = model_engine(**inputs)

if __name__ == "__main__":
    train()

{
    "fp16": {
        "enabled": "true"
    },
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 1e-5,
            "betas": [
                0.9,
                0.98
            ],
            "eps": 1e-8
        }
    },
    "zero_optimization": {
        "stage": 2
    },
    "train_batch_size": 1,
    "train_micro_batch_size_per_gpu": 1
}

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@zucchini-nlp

zucchini-nlp

Hey! Thanks for the PR. I believe it is duplicate of #35966, but that one got stale so let's merge this PR

Thanks for clear explanation, I agree with the fix. Can you run make-fix-copies to make CI green?

zucchini-nlp · 2025-02-05T15:13:38Z

Hehe, I think the PR got closed because make fix-copies deleted the changes. For models like qwen2-5-vl, the changes have to be done in modular_xxx.py files and the fix-copies will automatically it in model files :)

Feel free to re-open again

DeepWaved · 2025-02-06T02:24:35Z

Hehe, I think the PR got closed because make fix-copies deleted the changes. For models like qwen2-5-vl, the changes have to be done in modular_xxx.py files and the fix-copies will automatically it in model files :)

Feel free to re-open again

Thank you for your guidance. I will try to recommit. I have a question: after I finish modifying modular_qwen2_5_vl.py and run make fix-copies, the modeling_qwen2_5_vl.py was successfully modified. But additional unintended changes were also introduced, such as modeling_zamaba2.py. How can I avoid this? Thanks a lot.

zucchini-nlp · 2025-02-06T09:02:36Z

@DeepWaved that's weird, with latest main branch there should be no files reformatted usually, can you try to rebase?

UPDATE: oh just found out there are a few files that will be reformatted even with main, so it's okay yes

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL

ba3f3bf

zucchini-nlp approved these changes Feb 5, 2025

View reviewed changes

DeepWaved and others added 2 commits February 5, 2025 20:29

fix-copies of ba3f3bf

8b138f0

Merge branch 'huggingface:main' into qwen2-5-vl-rope-dtype-mismatch

9ec41a5

DeepWaved closed this Feb 5, 2025

DeepWaved deleted the qwen2-5-vl-rope-dtype-mismatch branch February 5, 2025 12:52

DeepWaved mentioned this pull request Feb 6, 2025

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36065

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36046

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36046

DeepWaved commented Feb 5, 2025

zucchini-nlp left a comment

zucchini-nlp commented Feb 5, 2025

DeepWaved commented Feb 6, 2025

zucchini-nlp commented Feb 6, 2025 •

edited

Loading

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36046

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36046

Conversation

DeepWaved commented Feb 5, 2025

What does this PR do?

Before submitting

Who can review?

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Feb 5, 2025

DeepWaved commented Feb 6, 2025

zucchini-nlp commented Feb 6, 2025 • edited Loading

zucchini-nlp commented Feb 6, 2025 •

edited

Loading