Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36065

DeepWaved · 2025-02-06T10:01:03Z

What does this PR do?

This PR same as #36046. Due to the previous incorrect operation, the PR has been resubmitted.

This PR addresses an issue that arises when using Flash Attention 2 in conjunction with other mixed-precision training frameworks like DeepSpeed, leading to an AssertionError:
"Input and cos/sin must have the same dtype, got torch.float32 and torch.float16."
The root cause is that the DeepSpeed framework implicitly converts inv_freq (L:118) to float16, while the rope computation process in both SDPA and the default case—specifically in the apply_rotary_pos_emb_vision function—involves converting it to float32. To prevent this bug from occurring, a similar conversion process has been added to apply_rotary_pos_emb_flashatt, ensuring consistent dtype handling and resolving the issue.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Test

import deepspeed
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

def train():
    MODEL_PATH = "Qwen2-5-VL-3B-Instruct"
    
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
        MODEL_PATH,
        torch_dtype="auto",
        device_map="cuda:0",
        attn_implementation="flash_attention_2",
    )
    processor = AutoProcessor.from_pretrained(MODEL_PATH)

    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
                    "max_pixels":128*128,
                },
                {"type": "text", "text": "Describe this image."},
            ],
        }
    ]
    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, video_inputs = process_vision_info(messages)
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt",
    ).to("cuda")

    model_engine, optimizer, _, _ = deepspeed.initialize(
        args=None,
        model=model,
        model_parameters=model.parameters(),
        config_params="ds.json",
    )
    model_engine.train()
    outputs = model_engine(**inputs)

if __name__ == "__main__":
    train()

{
    "fp16": {
        "enabled": "true"
    },
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 1e-5,
            "betas": [
                0.9,
                0.98
            ],
            "eps": 1e-8
        }
    },
    "zero_optimization": {
        "stage": 2
    },
    "train_batch_size": 1,
    "train_micro_batch_size_per_gpu": 1
}

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@zucchini-nlp

zucchini-nlp

Great, the unrelated files will be fixed soon in main, let's wait a bit and then rebase

DeepWaved · 2025-02-07T03:53:31Z

Great, the unrelated files will be fixed soon in main, let's wait a bit and then rebase

Hi! I tried rebase and merge the latest branch, but there is still an extra modification to modeling_gpt_neox.py. However, When I commit in a new branch based on the latest main, make fix-copies passes without extra changes. Should I force push this branch to the current qwen2-5-vl-rope-dtype-mismatch branch?

zucchini-nlp · 2025-02-07T08:15:16Z

@DeepWaved interesting. Yeah, let's force push and if it doesn't work, I will dive into it

DeepWaved · 2025-02-07T08:59:06Z

@DeepWaved interesting. Yeah, let's force push and if it doesn't work, I will dive into it

Luckily, it works!

HuggingFaceDocBuilderDev · 2025-02-07T09:30:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…36065)

zucchini-nlp approved these changes Feb 6, 2025

View reviewed changes

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL

f7a3c62

DeepWaved force-pushed the qwen2-5-vl-rope-dtype-mismatch branch from 214d54c to f7a3c62 Compare February 7, 2025 08:46

zucchini-nlp merged commit 014047e into huggingface:main Feb 7, 2025
10 checks passed

zucchini-nlp mentioned this pull request Feb 7, 2025

qwen2.5vl: fix bugs when using flash2+bf16 or num_return_sequences>1 #36083

Merged

5 tasks

MekkCyber pushed a commit that referenced this pull request Feb 7, 2025

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL (#36065)

8aa45e1

elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL (huggingface#…

0bdcccc

…36065)

zucchini-nlp mentioned this pull request Feb 14, 2025

Qwen2VL fix cos,sin dtypes to float when used with deepspeed #36188

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36065

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36065

DeepWaved commented Feb 6, 2025

zucchini-nlp left a comment

DeepWaved commented Feb 7, 2025

zucchini-nlp commented Feb 7, 2025

DeepWaved commented Feb 7, 2025

HuggingFaceDocBuilderDev commented Feb 7, 2025

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36065

Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL #36065

Conversation

DeepWaved commented Feb 6, 2025

What does this PR do?

Before submitting

Who can review?

zucchini-nlp left a comment

Choose a reason for hiding this comment

DeepWaved commented Feb 7, 2025

zucchini-nlp commented Feb 7, 2025

DeepWaved commented Feb 7, 2025

HuggingFaceDocBuilderDev commented Feb 7, 2025