Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which knowledge-distillation loss is best in "kd_losses "? #3

Open
yang0817manman opened this issue Sep 16, 2020 · 1 comment
Open

Comments

@yang0817manman
Copy link

thanks your shariing. Can you tell me which lossfunction is best in " "kd_losses " for classification task?

@AberHu
Copy link
Owner

AberHu commented Oct 14, 2020

Sorry for later reply. From my perspective, different kd losses are suitable for different tasks. For classification, the orginal KD (soft target) is ok, because it can be treated a variation of label smoothing regularization. You may tune the temperature and the trade-off parameters in soft target. Another tow kd losses for classification I recommended are sp and cc. Hope these kd losses have helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants