Is RoCm apex.amp deprecated & behavior mismatch vs NVIDIA APEX #118

fxmarty · 2023-09-19T11:25:25Z

Hi, I am wondering if RoCm apex.amp is deprecated? NVIDIA APEX has some deprecation warnings that are not present in this repo: https://github.com/NVIDIA/apex/pull/1506/files

Moreover, I realize that this code

import torch
import torch.nn as nn
from apex import amp

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(3, 4)

    def forward(self, attn_probs, value_states):
        attn_output = torch.bmm(attn_probs, value_states)
        return attn_output

from torch.optim import AdamW

model = MyModule().to("cuda")
optimizer = AdamW(model.parameters())

model, optimizer = amp.initialize(model, optimizer, opt_level="O1")

attn_probs = torch.rand(4, 16, 16).to("cuda")
value_states = torch.rand(4, 16, 2).to(torch.float16).to("cuda")

attn_output = model(attn_probs, value_states)

runs fine with NVIDA APEX but fails on RoCm APEX with the following log:

Traceback (most recent call last):
  File "run_bmm.py", line 26, in <module>
    attn_output = model(attn_probs, value_states)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "run_bmm.py", line 11, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

However, using torch.cuda.amp.autocast instead works fine for both RoCm and CUDA-powered devices (with torch 2.0.1).

Thank you!

The text was updated successfully, but these errors were encountered:

pruthvistony · 2023-09-19T16:28:11Z

@fxmarty,
I believe the problem could be happening due to some missing fix in Adam optimizer handling in ROCm apex. Checking on it will get back.

fxmarty added the bug Something isn't working label Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is RoCm apex.amp deprecated & behavior mismatch vs NVIDIA APEX #118

Is RoCm apex.amp deprecated & behavior mismatch vs NVIDIA APEX #118

fxmarty commented Sep 19, 2023

pruthvistony commented Sep 19, 2023

Is RoCm apex.amp deprecated & behavior mismatch vs NVIDIA APEX #118

Is RoCm apex.amp deprecated & behavior mismatch vs NVIDIA APEX #118

Comments

fxmarty commented Sep 19, 2023

pruthvistony commented Sep 19, 2023