Enable FusedRMSNorm #78

hubertlu-tw · 2022-04-18T17:34:13Z

There are two introduced unit test failures from tests/L0/run_fused_layer_norm/test_fused_layer_norm.py.

========================================================================= short test summary info ==========================================================================
FAILED test_fused_layer_norm.py::TestFusedLayerNormElemWiseHalf::test_layer_norm - RuntimeError: expected scalar type Float but found Half
FAILED test_fused_layer_norm.py::TestFusedLayerNormElemWiseBFloat16::test_layer_norm - RuntimeError: expected scalar type Float but found BFloat16
================================================================= 2 failed, 15 passed, 3 skipped in 42.61s =================================================================

* FusedRMSNorm based on FusedLayerNorm * refactor duplicated kernels * delete comments * delete comments * cleanup * cleanup * cleanup, fixed clobbering forward_affine_mixed_dtypes * fix pybind naming and add MixedFused test * undo skipping * check elementwise_affine * Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py Oof, nice catch, thanks Co-authored-by: Masaki Kozuki <[email protected]> Co-authored-by: Masaki Kozuki <[email protected]>

* [FusedRMSNorm doc] add epsilon to formula * correct * better wording

hubertlu-tw · 2022-07-29T20:19:21Z

After updating tests/L0/run_fused_layer_norm/test_fused_layer_norm.py, the results are as follows:

========================================================== 17 passed, 3 skipped, 17 warnings in 68.83s (0:01:08) ===========================================================

hubertlu-tw · 2022-07-29T21:17:36Z

The results from CI check "rocm-pytorch-release" show:

======================================================================
ERROR: test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) [torch.float16-True]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/apex/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py", line 295, in test_autocast
    self._run_test(dtype, elementwise_affine)
  File "/apex/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py", line 277, in _run_test
    expected = native(native_x.cpu())
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1129, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 383, in forward
    return manual_rms_norm(input, self.normalized_shape, self.weight, self.eps)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 29, in manual_rms_norm
    return weight * input
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/wrap.py", line 53, in wrapper
    return orig_fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/wrap.py", line 53, in wrapper
    return orig_fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/wrap.py", line 53, in wrapper
    return orig_fn(*args, **kwargs)
  [Previous line repeated 24 more times]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

----------------------------------------------------------------------

The failed test cannot be reproduced locally in the rocm/pytorch:latest docker
(PyTorch: 1.12.0a0+git2a932eb, ROCm: 5.2.0)

hubertlu-tw · 2022-07-29T22:30:03Z

jenkins: retest this please

hubertlu-tw · 2022-07-29T22:37:42Z

One of the failed test in the CI check "rocm-pytorch-release" related to the code changes of this PR is:
run_fused_layer_norm/test_fused_layer_norm.py::TestFusedLayerNormElemWiseBFloat16::test_layer_norm

It is marked as flaky test in upstream: https://github.com/NVIDIA/apex/blob/684c4733b1d44f6edfe4b190bf0cc55eb1ae1940/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py#L164

hubertlu-tw · 2022-08-01T23:31:08Z

apex-rocm-pytorch-master (#30)

test_adam_option (test_fused_optimizer.TestFusedAdam): due to a regression introduced from upstream PyTorch
test_multi_device (test_fused_optimizer.TestFusedAdam): due to a regression introduced from upstream PyTorch
test_float (test_fused_optimizer.TestFusedAdam): due to a regression introduced from upstream PyTorch
test_multi_device (test_lamb.TestFusedMixedPrecisionLamb): the test failure is irrelevant to the code changes of this PR.
test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) [torch.float16-True]: skipped since the test failure cannot be reproduced locally.

apex-rocm-pytorch-release (#176)

test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) [torch.float16-True]: skipped since the test failure cannot be reproduced locally.

jithunnair-amd

Among the unit test failures, this one concerns me: test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) because it is related to FusedRMSNorm, which is what this PR is trying to enable. Can we run the CI again to see if this unit test failure still occurs on both master and release CI runs? If so, we should be able to reproduce this reliably. Alternately, if you wish to skip the unit test for now, please create a github issue for the unit test failures introduced by this PR.

csrc/layer_norm_cuda_kernel.cu

…ties()->warpSize

jithunnair-amd · 2022-08-05T15:51:41Z

test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm): #81

test_adam_option (test_fused_optimizer.TestFusedAdam): due to https://github.com/pytorch/pytorch/issues/80809#issuecomment-1175211598
test_multi_device (test_fused_optimizer.TestFusedAdam): due to https://github.com/pytorch/pytorch/issues/80809#issuecomment-1175211598
test_float (test_fused_optimizer.TestFusedAdam): due to https://github.com/pytorch/pytorch/issues/80809#issuecomment-1175211598

: #82
test_multi_device (test_lamb.TestFusedMixedPrecisionLamb): #83

eqy and others added 6 commits April 14, 2022 23:15

fix and generate docs for FusedRMSNorm (NVIDIA#1285)

fceec07

[FusedRMSNorm doc] document where epsilon is added (NVIDIA#1295)

4792170

* [FusedRMSNorm doc] add epsilon to formula * correct * better wording

Fix some bugs

d755f1f

Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs

28c5638

Fix NaN issues in FusedRMSNorm

8df1b6b

hubertlu-tw self-assigned this Apr 18, 2022

Update test_fused_layer_norm.py

0df6c4c

hubertlu-tw requested a review from jithunnair-amd July 29, 2022 22:19

Skip test_fused_layer_norm.TestAutocastFusedRMSNorm on ROCm

2ed8db7

jithunnair-amd reviewed Aug 2, 2022

View reviewed changes

csrc/layer_norm_cuda_kernel.cu Outdated Show resolved Hide resolved

csrc/layer_norm_cuda_kernel.cu Outdated Show resolved Hide resolved

csrc/layer_norm_cuda_kernel.cu Show resolved Hide resolved

Use at::cuda::warp_size() instead of at::cuda::getCurrentDeviceProper…

fc79ed8

…ties()->warpSize

This was referenced Aug 5, 2022

test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm) #81

Open

Unit tests failing with "AssertionError: If capturable=False, state_steps should not be CUDA tensors." #82

Closed

test_multi_device (test_lamb.TestFusedMixedPrecisionLamb) #83

Open

jithunnair-amd merged commit c97ebfa into master Aug 5, 2022

This was referenced Aug 5, 2022

Skip some failed unit tests from the FusedRMSNorm PR #84

Closed

Skip the failing unit tests from the FusedRMSNorm PR #85

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable FusedRMSNorm #78

Enable FusedRMSNorm #78

hubertlu-tw commented Apr 18, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Aug 1, 2022 •

edited

Loading

jithunnair-amd left a comment •

edited

Loading

jithunnair-amd commented Aug 5, 2022

Enable FusedRMSNorm #78

Enable FusedRMSNorm #78

Conversation

hubertlu-tw commented Apr 18, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Jul 29, 2022

hubertlu-tw commented Aug 1, 2022 • edited Loading

jithunnair-amd left a comment • edited Loading

Choose a reason for hiding this comment

jithunnair-amd commented Aug 5, 2022

hubertlu-tw commented Aug 1, 2022 •

edited

Loading

jithunnair-amd left a comment •

edited

Loading