Why using `log_softmax` instead of `softmax`? #233

nguyenvulong · 2022-07-04T08:41:58Z

Same question has been asked here and here . These repositories (I think you already know them) are other attempts to implement knowledge distillation algorithms.

Could you please explain why it used log_softmax instead of softmax?

torchdistill/torchdistill/losses/single.py

Lines 99 to 106 in 993ee94

    
           def forward(self, student_output, teacher_output, targets=None, *args, **kwargs): 
        
               soft_loss = super().forward(torch.log_softmax(student_output / self.temperature, dim=1), 
        
                                           torch.softmax(teacher_output / self.temperature, dim=1)) 
        
               if self.alpha is None or self.alpha == 0 or targets is None: 
        
                   return soft_loss 
        
               hard_loss = self.cross_entropy_loss(student_output, targets) 
        
               return self.alpha * hard_loss + self.beta * (self.temperature ** 2) * soft_loss

The text was updated successfully, but these errors were encountered:

yoshitomo-matsubara · 2022-07-04T17:25:00Z

Hi @nguyenvulong

See KLDivLoss in PyTorch document.

To avoid underflow issues when computing this quantity, this loss expects the argument input in the log-space. The argument target may also be provided in the log-space if log_target= True.

Also, please use Discussions above (instead of Issues) for questions.
As explained here, I want to keep Issues mainly for bug reports.

yoshitomo-matsubara closed this as completed Jul 4, 2022

This was referenced Jul 4, 2022

Additional Log in the softmax function AberHu/Knowledge-Distillation-Zoo#12

Open

Why using log_softmax instead of softmax? HobbitLong/RepDistiller#52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why using `log_softmax` instead of `softmax`? #233

Why using `log_softmax` instead of `softmax`? #233

nguyenvulong commented Jul 4, 2022 •

edited

Loading

yoshitomo-matsubara commented Jul 4, 2022

Why using log_softmax instead of softmax? #233

Why using log_softmax instead of softmax? #233

Comments

nguyenvulong commented Jul 4, 2022 • edited Loading

yoshitomo-matsubara commented Jul 4, 2022

Why using `log_softmax` instead of `softmax`? #233

Why using `log_softmax` instead of `softmax`? #233

nguyenvulong commented Jul 4, 2022 •

edited

Loading