DebertaForQuestionAnswering amp_bf16/amp_fp16 training/inference accuracy/performance got failed #1390

kaileiyx · 2025-02-21T08:00:52Z

🐛 Describe the bug

python benchmarks/dynamo/huggingface.py --accuracy --amp -d xpu -n10 --amp-dtype bfloat16 --training --only DebertaForQuestionAnswering --backend=inductor

xpu train DebertaForQuestionAnswering
Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 1905, in validate_model
self.model_iter_fn(model, example_inputs)
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/huggingface.py", line 528, in forward_and_backward_pass
pred = mod(**cloned_inputs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 1289, in forward
outputs = self.deberta(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 797, in forward
encoder_outputs = self.encoder(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 608, in forward
hidden_states, att_m = layer_module(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 525, in forward
attention_output, att_matrix = self.attention(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 460, in forward
self_output, att_matrix = self.self(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 290, in forward
attention_scores = attention_scores.masked_fill(~(attention_mask), torch.finfo(query_layer.dtype).min)
RuntimeError: value cannot be converted to type at::BFloat16 without overflow

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 3986, in run
) = runner.load_model(
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/huggingface.py", line 458, in load_model
self.validate_model(model, example_inputs)
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 1907, in validate_model
raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed

Versions

Envirnoments:
Device: PVC 1100
torch-xpu-ops: https://github.com/etaf/pytorch-inductor-xpu/commits/xpu_inductor_windows/
python: 3.10
TRITON_COMMIT_ID: c4a79a1960ba1c247c2548cbd3abf6a728b3ce6f
TORCH_COMMIT_ID: 1ba49a78edafa61e2ce4f80d147576e66566eec3
TORCHBENCH_COMMIT_ID: 373ffb19dc470f4423a3176a4133f8f4b3cdb5bd
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: f084f34bbb743fada85f66b0ed8041387565e69c
DRIVER_VERSION: 1.23.10.49.231129.50
KERNEL_VERSION: 5.15.0-73-generic #80 SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

daisyden added this to the PT2.7 milestone Feb 24, 2025

daisyden assigned xytintel Feb 24, 2025

mengfei25 assigned etaf and unassigned xytintel Feb 24, 2025

daisyden added the E2E label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DebertaForQuestionAnswering amp_bf16/amp_fp16 training/inference accuracy/performance got failed #1390

DebertaForQuestionAnswering amp_bf16/amp_fp16 training/inference accuracy/performance got failed #1390

kaileiyx commented Feb 21, 2025

DebertaForQuestionAnswering amp_bf16/amp_fp16 training/inference accuracy/performance got failed #1390

DebertaForQuestionAnswering amp_bf16/amp_fp16 training/inference accuracy/performance got failed #1390

Comments

kaileiyx commented Feb 21, 2025

🐛 Describe the bug

Versions