Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable FusedRMSNorm #78

Merged
merged 9 commits into from
Aug 5, 2022
Merged

Enable FusedRMSNorm #78

merged 9 commits into from
Aug 5, 2022

Conversation

hubertlu-tw
Copy link

There are two introduced unit test failures from tests/L0/run_fused_layer_norm/test_fused_layer_norm.py.

========================================================================= short test summary info ==========================================================================
FAILED test_fused_layer_norm.py::TestFusedLayerNormElemWiseHalf::test_layer_norm - RuntimeError: expected scalar type Float but found Half
FAILED test_fused_layer_norm.py::TestFusedLayerNormElemWiseBFloat16::test_layer_norm - RuntimeError: expected scalar type Float but found BFloat16
================================================================= 2 failed, 15 passed, 3 skipped in 42.61s =================================================================

eqy and others added 6 commits April 14, 2022 23:15
* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks

Co-authored-by: Masaki Kozuki <[email protected]>

Co-authored-by: Masaki Kozuki <[email protected]>
* [FusedRMSNorm doc] add epsilon to formula

* correct

* better wording
@hubertlu-tw hubertlu-tw self-assigned this Apr 18, 2022
@hubertlu-tw
Copy link
Author

After updating tests/L0/run_fused_layer_norm/test_fused_layer_norm.py, the results are as follows:

========================================================== 17 passed, 3 skipped, 17 warnings in 68.83s (0:01:08) ===========================================================

@hubertlu-tw
Copy link
Author

The results from CI check "rocm-pytorch-release" show:

======================================================================
ERROR: test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) [torch.float16-True]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/apex/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py", line 295, in test_autocast
    self._run_test(dtype, elementwise_affine)
  File "/apex/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py", line 277, in _run_test
    expected = native(native_x.cpu())
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1129, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 383, in forward
    return manual_rms_norm(input, self.normalized_shape, self.weight, self.eps)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 29, in manual_rms_norm
    return weight * input
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/wrap.py", line 53, in wrapper
    return orig_fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/wrap.py", line 53, in wrapper
    return orig_fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/wrap.py", line 53, in wrapper
    return orig_fn(*args, **kwargs)
  [Previous line repeated 24 more times]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

----------------------------------------------------------------------

The failed test cannot be reproduced locally in the rocm/pytorch:latest docker
(PyTorch: 1.12.0a0+git2a932eb, ROCm: 5.2.0)

@hubertlu-tw
Copy link
Author

jenkins: retest this please

@hubertlu-tw
Copy link
Author

One of the failed test in the CI check "rocm-pytorch-release" related to the code changes of this PR is:
run_fused_layer_norm/test_fused_layer_norm.py::TestFusedLayerNormElemWiseBFloat16::test_layer_norm

It is marked as flaky test in upstream: https://github.com/NVIDIA/apex/blob/684c4733b1d44f6edfe4b190bf0cc55eb1ae1940/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py#L164

@hubertlu-tw
Copy link
Author

hubertlu-tw commented Aug 1, 2022

apex-rocm-pytorch-master (#30)

apex-rocm-pytorch-release (#176)

  • test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) [torch.float16-True]: skipped since the test failure cannot be reproduced locally.

Copy link
Collaborator

@jithunnair-amd jithunnair-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Among the unit test failures, this one concerns me: test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) because it is related to FusedRMSNorm, which is what this PR is trying to enable. Can we run the CI again to see if this unit test failure still occurs on both master and release CI runs? If so, we should be able to reproduce this reliably. Alternately, if you wish to skip the unit test for now, please create a github issue for the unit test failures introduced by this PR.

csrc/layer_norm_cuda_kernel.cu Outdated Show resolved Hide resolved
csrc/layer_norm_cuda_kernel.cu Outdated Show resolved Hide resolved
csrc/layer_norm_cuda_kernel.cu Show resolved Hide resolved
@jithunnair-amd
Copy link
Collaborator

test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm): #81

test_adam_option (test_fused_optimizer.TestFusedAdam): due to https://github.com/pytorch/pytorch/issues/80809#issuecomment-1175211598
test_multi_device (test_fused_optimizer.TestFusedAdam): due to https://github.com/pytorch/pytorch/issues/80809#issuecomment-1175211598
test_float (test_fused_optimizer.TestFusedAdam): due to https://github.com/pytorch/pytorch/issues/80809#issuecomment-1175211598

: #82
test_multi_device (test_lamb.TestFusedMixedPrecisionLamb): #83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants