KL loss in validation #23

lizekai-richard · 2024-09-30T09:18:33Z

Hi, may I ask why the KL loss is used during validation? This doesn't match equation 9 in the paper which is a cross-entropy loss.

Jiacheng8 · 2024-11-20T07:07:17Z

Hi, I think it's basically the same, you can think this KL-divergence as a cross-entropy function to multiple soft labels.

lizekai-richard · 2024-11-20T07:10:17Z

@Jiacheng8
Are there any ablation results? How does using KL-loss compare with using cross-entropy loss?

Jiacheng8 · 2024-11-20T07:24:32Z

The two are equivalent except for a constant term, but of course the optimisation will be a little different for the lowest value, I will try to do an ablation result on this.

lizekai-richard · 2024-11-20T07:27:55Z

@Jiacheng8 Thanks. My major concern is that using kl-loss results in stronger knowledge distillation enhancement, especially when temperature is also used in your case. I'm wondering if adopting this evaluation strategy is the main reason for performance improvement instead of the dataset itself having a higher quality. Yet, the paper didn't present ablation on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KL loss in validation #23

KL loss in validation #23

lizekai-richard commented Sep 30, 2024

Jiacheng8 commented Nov 20, 2024

lizekai-richard commented Nov 20, 2024 •

edited

Loading

Jiacheng8 commented Nov 20, 2024

lizekai-richard commented Nov 20, 2024

KL loss in validation #23

KL loss in validation #23

Comments

lizekai-richard commented Sep 30, 2024

Jiacheng8 commented Nov 20, 2024

lizekai-richard commented Nov 20, 2024 • edited Loading

Jiacheng8 commented Nov 20, 2024

lizekai-richard commented Nov 20, 2024

lizekai-richard commented Nov 20, 2024 •

edited

Loading