Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DebertaForQuestionAnswering amp_bf16/amp_fp16 training/inference accuracy/performance got failed #1390

Open
kaileiyx opened this issue Feb 21, 2025 · 0 comments
Assignees
Labels
Milestone

Comments

@kaileiyx
Copy link

🐛 Describe the bug

python benchmarks/dynamo/huggingface.py --accuracy --amp -d xpu -n10 --amp-dtype bfloat16 --training --only DebertaForQuestionAnswering --backend=inductor

xpu train DebertaForQuestionAnswering
Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 1905, in validate_model
self.model_iter_fn(model, example_inputs)
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/huggingface.py", line 528, in forward_and_backward_pass
pred = mod(**cloned_inputs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 1289, in forward
outputs = self.deberta(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 797, in forward
encoder_outputs = self.encoder(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 608, in forward
hidden_states, att_m = layer_module(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 525, in forward
attention_output, att_matrix = self.attention(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 460, in forward
self_output, att_matrix = self.self(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/transformers/models/deberta/modeling_deberta.py", line 290, in forward
attention_scores = attention_scores.masked_fill(~(attention_mask), torch.finfo(query_layer.dtype).min)
RuntimeError: value cannot be converted to type at::BFloat16 without overflow

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 3986, in run
) = runner.load_model(
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/huggingface.py", line 458, in load_model
self.validate_model(model, example_inputs)
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 1907, in validate_model
raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed

Versions

Envirnoments:
Device: PVC 1100
torch-xpu-ops: https://github.com/etaf/pytorch-inductor-xpu/commits/xpu_inductor_windows/
python: 3.10
TRITON_COMMIT_ID: c4a79a1960ba1c247c2548cbd3abf6a728b3ce6f
TORCH_COMMIT_ID: 1ba49a78edafa61e2ce4f80d147576e66566eec3
TORCHBENCH_COMMIT_ID: 373ffb19dc470f4423a3176a4133f8f4b3cdb5bd
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: f084f34bbb743fada85f66b0ed8041387565e69c
DRIVER_VERSION: 1.23.10.49.231129.50
KERNEL_VERSION: 5.15.0-73-generic #80 SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

@daisyden daisyden added this to the PT2.7 milestone Feb 24, 2025
@mengfei25 mengfei25 assigned etaf and unassigned xytintel Feb 24, 2025
@daisyden daisyden added the E2E label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants