-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable FusedRMSNorm #78
Conversation
* FusedRMSNorm based on FusedLayerNorm * refactor duplicated kernels * delete comments * delete comments * cleanup * cleanup * cleanup, fixed clobbering forward_affine_mixed_dtypes * fix pybind naming and add MixedFused test * undo skipping * check elementwise_affine * Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py Oof, nice catch, thanks Co-authored-by: Masaki Kozuki <[email protected]> Co-authored-by: Masaki Kozuki <[email protected]>
* [FusedRMSNorm doc] add epsilon to formula * correct * better wording
After updating tests/L0/run_fused_layer_norm/test_fused_layer_norm.py, the results are as follows:
|
The results from CI check "rocm-pytorch-release" show:
The failed test cannot be reproduced locally in the rocm/pytorch:latest docker |
jenkins: retest this please |
One of the failed test in the CI check "rocm-pytorch-release" related to the code changes of this PR is: It is marked as flaky test in upstream: https://github.com/NVIDIA/apex/blob/684c4733b1d44f6edfe4b190bf0cc55eb1ae1940/tests/L0/run_fused_layer_norm/test_fused_layer_norm.py#L164 |
apex-rocm-pytorch-master (#30)
apex-rocm-pytorch-release (#176)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Among the unit test failures, this one concerns me: test_autocast (test_fused_layer_norm.TestAutocastFusedRMSNorm)
because it is related to FusedRMSNorm, which is what this PR is trying to enable. Can we run the CI again to see if this unit test failure still occurs on both master and release CI runs? If so, we should be able to reproduce this reliably. Alternately, if you wish to skip the unit test for now, please create a github issue for the unit test failures introduced by this PR.
: #82 |
There are two introduced unit test failures from
tests/L0/run_fused_layer_norm/test_fused_layer_norm.py
.