Add TVD (Total variation distance) Kernel #281

qingquansong · 2024-09-28T23:15:46Z

🚀 The feature, motivation and pitch

TVD is a good distance metric (ref) and easy to implement kernel to make the gradient more stable compared to KL divergence and JS Divergence.

Alternatives

No response

Additional context

No response

S1ro1 · 2024-09-30T17:15:41Z

I'll look into it over the week if noone else takes.

saurabhkoshatwar · 2024-10-22T01:46:02Z

#take @ByronHsu @qingquansong , I’d like to make an attempt. Could you please assign it to me?

ByronHsu · 2024-10-22T03:10:40Z

assigned to you. Thanks!

## Summary Resolves [#281](#281). Implements the TVD (Total Variation Distance) kernel by computing both the loss and gradient in the forward pass. ## Testing Done Implemented tests to verify that the results of the forward and backward passes match the Torch implementation. Additionally, added a script to benchmark the memory usage and speed of the Liger implementation compared to Torch, with the results shown below. ![tvd_speed](https://github.com/user-attachments/assets/05080030-81ae-4751-aba2-f001a5144072) ![tvd_memory](https://github.com/user-attachments/assets/d7dbb9d9-5fd5-4ffe-aa12-8ba785e09857) - Hardware Type: Nvidia H100 (80GB PCIe) - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence` to ensure convergence --------- Co-authored-by: Shao Tang <[email protected]>

qingquansong added the feature label Sep 28, 2024

ByronHsu assigned saurabhkoshatwar Oct 22, 2024

saurabhkoshatwar mentioned this issue Oct 26, 2024

Add TVD Loss Kernel #324

Merged

3 tasks

shivam15s closed this as completed in #324 Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TVD (Total variation distance) Kernel #281

Add TVD (Total variation distance) Kernel #281

qingquansong commented Sep 28, 2024 •

edited

Loading

S1ro1 commented Sep 30, 2024

saurabhkoshatwar commented Oct 22, 2024 •

edited

Loading

ByronHsu commented Oct 22, 2024

Add TVD (Total variation distance) Kernel #281

Add TVD (Total variation distance) Kernel #281

Comments

qingquansong commented Sep 28, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

S1ro1 commented Sep 30, 2024

saurabhkoshatwar commented Oct 22, 2024 • edited Loading

ByronHsu commented Oct 22, 2024

qingquansong commented Sep 28, 2024 •

edited

Loading

saurabhkoshatwar commented Oct 22, 2024 •

edited

Loading