Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update LayerNorm with Welford online algorithm #1374

Merged
merged 22 commits into from
Feb 24, 2025
Merged

Conversation

min-jean-cho
Copy link
Contributor

Welford online algorithm improves the numerical stability of variance computation compared to naive or two-pass algorithm.

Simple test case:

import torch
import torch.nn as nn

B, D = 1, 5
x = torch.tensor([[1e15, 1e15 + 1, 1e15 + 2, 1e15 + 3, 1e15 + 4]], dtype=torch.float32).to("xpu")
layernorm = nn.LayerNorm(D, elementwise_affine=False)
y = layernorm(x)
print("LayerNorm Output:\n", y)

Output now (welford):

tensor([[0., 0., 0., 0., 0.]], device='xpu:0')

Output before (two-pass):

tensor([[nan, nan, nan, nan, nan]], device='xpu:0')

@xytintel
Copy link
Contributor

Please add test case

@min-jean-cho min-jean-cho marked this pull request as draft February 18, 2025 03:13
@dvrogozh
Copy link
Contributor

Build fails against latest pytorch with:

/home/dvrogozh/git/pytorch/pytorch/third_party/torch-xpu-ops/src/ATen/native/xpu/sycl/LayerNormKernels.cpp:365:37: error: use of undeclared identifier 'get_group_reduce_group_size'
  365 |         sycl_local_acc_t<float>(2 * get_group_reduce_group_size(sg_size_), cgh);
      |                                     ^
/home/dvrogozh/git/pytorch/pytorch/third_party/torch-xpu-ops/src/ATen/native/xpu/sycl/LayerNormKernels.cpp:367:33: error: use of undeclared identifier 'get_group_reduce_group_size'
  367 |         sycl_local_acc_t<float>(get_group_reduce_group_size(sg_size_), cgh);
      |                                 ^
2 errors generated.

Please, fix. I would like to verify PR against pytorch/pytorch#141642.

@min-jean-cho min-jean-cho marked this pull request as ready for review February 19, 2025 02:08
@min-jean-cho min-jean-cho changed the title [Draft] Update LayerNorm with Welford online algorithm [WIP][Test] Update LayerNorm with Welford online algorithm Feb 19, 2025
@xytintel
Copy link
Contributor

@min-jean-cho Please resolve case failure

Copy link
Contributor

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me to resolve pytorch/pytorch#141642. Please, update PR title (drop "[WIP][Test]") if ready.

@xytintel xytintel changed the title [WIP][Test] Update LayerNorm with Welford online algorithm Update LayerNorm with Welford online algorithm Feb 21, 2025
@xytintel xytintel added this pull request to the merge queue Feb 24, 2025
Merged via the queue into main with commit 306a0ff Feb 24, 2025
8 of 9 checks passed
@xytintel xytintel deleted the minjean/welford_layernorm branch February 24, 2025 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants