Add weight support for LigerCrossEntropy #420

Tcc0403 · 2024-12-02T12:52:09Z

Summary

Resolve #404.
Note: current implementation doesn't weight z loss.

Reference: PyTorch's CrossEntropyLoss

Testing Done

It hasn't fully tested with other params.

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

pramodith

Thanks for taking care of this! Had a few minor suggestions.

Another TODO is based on the original paper linked in the original issue for this feature. We also need to support a sample level weight. i.e. a weight that can be applied to each element of the batch if we have logits in the shape (B, S, V). We'd have sample level weights of shape (B, ). This is what's proposed in the C-RLFT paper. https://arxiv.org/abs/2309.11235

src/liger_kernel/ops/cross_entropy.py

test/transformers/test_cross_entropy.py

Tcc0403 · 2024-12-02T18:22:52Z

Feel free to push to this branch or even take over it and open a new PR, I won't be able to update that often in the next few months. Just trying to make the first step when I got time.

Tcc0403 · 2024-12-02T18:24:49Z

test/transformers/test_cross_entropy.py

+        (1.0, torch.float32, 1e-8, 1e-6),
+    ],
+)
+def test_correctness_with_weight_with_other_params_once(


This test couldn't pass somehow. I might miss something.

So, the issue seems to be with combining label_smoothing with weighted loss. I've been staring at the code and equations for a while now but I can't pinpoint anything that's wrong. Simply multiplying the final loss with the weight of the label token seems like the right thing to do to me.

If not there can only be an issue with the:

scaled_x_sum term since all the other terms in smoothed loss are also a part of the plain ce loss which we know works correctly.

Figuring out where it doesn't work is a big! I'll take a look on Saturday.

pramodith · 2024-12-02T22:21:50Z

Feel free to push to this branch or even take over it and open a new PR, I won't be able to update that often in the next few months. Just trying to make the first step when I got time.

Gotcha! I'll try wrapping it up, you've done most of the heavy lifting already.

Tcc0403 · 2024-12-08T04:16:18Z

I took a look at torch's impl, and here's how they compute smooth_loss
https://github.com/pytorch/pytorch/blob/2682e5e0d48a8200c1672b6a42250d3c8de44190/aten/src/ATen/native/LossNLL.cpp#L558

    if (weight.defined()) {
      // Expand weight to the correct number of dims for broadcasting with input / target
      auto weight_broadcast_shape = SmallBuffer<int64_t, 5>(input.dim());
      std::fill(weight_broadcast_shape.begin(), weight_broadcast_shape.end(), 1);
      weight_broadcast_shape[class_dim] = weight.size(0);
      Tensor weight_ = weight.view(weight_broadcast_shape);

      smooth_loss = -(input * weight_).sum(class_dim);

related code blocks in liger:

Liger-Kernel/src/liger_kernel/ops/cross_entropy.py

Line 119 in bd65c47

scaled_x_sum += tl.sum(tl.where(X_offsets < n_cols, -eps * X_block, 0.0))

Liger-Kernel/src/liger_kernel/ops/cross_entropy.py

Lines 194 to 196 in bd65c47

    
           if label_smoothing > 0: 
        
               smooth_loss = scaled_x_sum + label_smoothing * lse 
        
               loss = loss * (1 - label_smoothing) + smooth_loss

Tcc0403 · 2024-12-08T04:20:36Z

src/liger_kernel/ops/cross_entropy.py

+        selected_weight = torch.where(
+            target_mask, torch.gather(weight, dim=0, index=target * target_mask), 0.0
+        )
+        sum_of_non_ignore_weight = selected_weight.sum().item()


we can rewrite it with torch.masked_select

sum_of_non_ignore_weight = (torch.gather(weight, dim=0, index=target.masked_select(target_mask)) .sum() .item() )

Refer to torch's impl mentioned above

winglian · 2024-12-11T11:26:38Z

@pramodith anything I can do to help with this PR?

pramodith · 2024-12-11T12:35:43Z

@pramodith anything I can do to help with this PR?

Hey @winglian I won't be able to look into this any further, feel free to take over and see if you can figure out the source of discrepancy. The tests fail when combining smoothing loss with weighted ce.

Resolved

Tcc0403 · 2024-12-22T10:10:17Z

I'll make an another PR for sample level weight.

bboyleonp666

Hi @Tcc0403, thanks for your wonderful work. I left some of my thoughts for this PR, PTAL.

src/liger_kernel/ops/cross_entropy.py

bboyleonp666 · 2024-12-24T11:01:43Z

src/liger_kernel/ops/fused_linear_cross_entropy.py

    # NOTE: skip .item() here to avoid CUDA synchronization
-    total_n_non_ignore = (target != ignore_index).sum()
+    target_mask = target != ignore_index
+    total_n_non_ignore = target_mask.sum().item()


I have noticed the comment above to avoid using .item() for synchronization issue. Will this change align this behavior?

Forgot to remove the comment, it doesn't affect the result.

src/liger_kernel/ops/fused_linear_cross_entropy.py

-            _input,
-            weight,
-            target,
-            bias,
-            ignore_index,
-            lse_square_scale,
-            label_smoothing,
-            reduction,
-            softcap,
+            _input=_input,
+            weight=weight,
+            target=target,
+            bias=bias,
+            ce_weight=ce_weight,
+            ignore_index=ignore_index,
+            lse_square_scale=lse_square_scale,
+            label_smoothing=label_smoothing,
+            reduction=reduction,
+            softcap=softcap,


Tcc0403 · 2024-12-24T13:25:58Z

I'll update it on Saturday. Thanks for your review.

bboyleonp666

It's great to see that all the tests are passed right now. I need some time for further checking the tests. But for algorithm, it already LGTM.

bboyleonp666 · 2024-12-28T12:36:04Z

src/liger_kernel/ops/cross_entropy.py

@@ -197,6 +197,7 @@ def liger_cross_entropy_kernel(
            if reduction == "mean":
                dloss_ori = dloss_ori / sum_non_ignore_weight
                dloss_smooth = dloss_smooth / sum_non_ignore_weight
+                # z_loss isn't scaled by weight


Correct me if I were wrong. I guess the comment is to point out the z loss is not scaled by weight, thus it's not divided it by sum_non_ignore_weight but n_non_ignore. In this case, I think adding a TODO for it will be a better idea.

Makes sense to me. Thanks for your fast response.

bboyleonp666 · 2024-12-28T12:36:07Z

src/liger_kernel/ops/cross_entropy.py

@@ -247,6 +248,7 @@ def liger_cross_entropy_kernel(
            loss = loss / sum_non_ignore_weight
        else:
            loss = loss / n_non_ignore
+        # z_loss isn't scaled by weight


bboyleonp666

LGTM

austin362667

LGTM 🚀

Tcc0403 added 3 commits December 2, 2024 16:46

Add weight support for LigerCrossEntropy

2d66515

Update cross_entropy_kernel args in flce

dbe4237

Add comments

e770182

Tcc0403 requested review from pramodith and ByronHsu December 2, 2024 12:52

pramodith previously requested changes Dec 2, 2024

View reviewed changes

Tcc0403 added 2 commits December 3, 2024 01:01

Add complete test with other params

f38e1e2

Fix invalid range access bug

45f6c1f

Tcc0403 commented Dec 2, 2024

View reviewed changes

Tcc0403 commented Dec 8, 2024

View reviewed changes

Tcc0403 added 8 commits December 15, 2024 13:39

Refactor variable names and computation of target's weights

a1a4f0a

Merge branch 'main' into tcc/weight-ce

0473e22

Fix unit test

2e6ded2

Block invalid operation when weight is None

d54ce80

Update gradients calculation for weighted smooth loss

cbaf88f

Clean up

ec134fc

Update flce

7ed4dd9

Fix kernel arugments in flce

5535a60

Tcc0403 requested a review from pramodith December 22, 2024 07:31

Tcc0403 requested a review from austin362667 December 23, 2024 10:19

bboyleonp666 reviewed Dec 24, 2024

View reviewed changes

Tcc0403 force-pushed the tcc/weight-ce branch from b6253b0 to 8195398 Compare December 28, 2024 02:34

Tcc0403 requested a review from bboyleonp666 December 28, 2024 10:24

bboyleonp666 reviewed Dec 28, 2024

View reviewed changes

bboyleonp666 approved these changes Dec 28, 2024

View reviewed changes

Tcc0403 added 6 commits December 29, 2024 11:59

Merge branch 'main' into tcc/weight-ce

b4daf5d

Clean up and checkstyle

ddf68e4

add TODO

cebf04d

Add comments

6ee6ce7

Add TODO for weighted z_loss

b32b332

Make checkstyle

93703ff

Tcc0403 force-pushed the tcc/weight-ce branch from 4d3a4e3 to 93703ff Compare December 29, 2024 04:03

Tcc0403 enabled auto-merge (squash) December 29, 2024 04:16

Tcc0403 disabled auto-merge December 29, 2024 04:17

austin362667 approved these changes Dec 29, 2024

View reviewed changes

Tcc0403 merged commit 42ff02a into main Dec 29, 2024
5 checks passed

Tcc0403 deleted the tcc/weight-ce branch December 29, 2024 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weight support for LigerCrossEntropy #420

Add weight support for LigerCrossEntropy #420

Tcc0403 commented Dec 2, 2024 •

edited

Loading

pramodith left a comment

Tcc0403 commented Dec 2, 2024

Tcc0403 Dec 2, 2024

pramodith Dec 4, 2024

Tcc0403 Dec 5, 2024

pramodith commented Dec 2, 2024

Tcc0403 commented Dec 8, 2024

Tcc0403 Dec 8, 2024 •

edited

Loading

winglian commented Dec 11, 2024

pramodith commented Dec 11, 2024

Tcc0403 commented Dec 22, 2024

bboyleonp666 left a comment

bboyleonp666 Dec 24, 2024 •

edited

Loading

Tcc0403 Dec 24, 2024

This comment was marked as off-topic.

Tcc0403 commented Dec 24, 2024

bboyleonp666 left a comment

bboyleonp666 Dec 28, 2024

Tcc0403 Dec 28, 2024

bboyleonp666 Dec 28, 2024

bboyleonp666 left a comment

austin362667 left a comment

Add weight support for LigerCrossEntropy #420

Add weight support for LigerCrossEntropy #420

Conversation

Tcc0403 commented Dec 2, 2024 • edited Loading

Summary

Testing Done

pramodith left a comment

Choose a reason for hiding this comment

Tcc0403 commented Dec 2, 2024

Tcc0403 Dec 2, 2024

Choose a reason for hiding this comment

pramodith Dec 4, 2024

Choose a reason for hiding this comment

Tcc0403 Dec 5, 2024

Choose a reason for hiding this comment

pramodith commented Dec 2, 2024

Tcc0403 commented Dec 8, 2024

Tcc0403 Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

winglian commented Dec 11, 2024

pramodith commented Dec 11, 2024

Tcc0403 commented Dec 22, 2024

bboyleonp666 left a comment

Choose a reason for hiding this comment

bboyleonp666 Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

Tcc0403 Dec 24, 2024

Choose a reason for hiding this comment

This comment was marked as off-topic.

Tcc0403 commented Dec 24, 2024

bboyleonp666 left a comment

Choose a reason for hiding this comment

bboyleonp666 Dec 28, 2024

Choose a reason for hiding this comment

Tcc0403 Dec 28, 2024

Choose a reason for hiding this comment

bboyleonp666 Dec 28, 2024

Choose a reason for hiding this comment

bboyleonp666 left a comment

Choose a reason for hiding this comment

austin362667 left a comment

Choose a reason for hiding this comment

Tcc0403 commented Dec 2, 2024 •

edited

Loading

Tcc0403 Dec 8, 2024 •

edited

Loading

bboyleonp666 Dec 24, 2024 •

edited

Loading