You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now I am working on using CRD loss for training my face recognition model, while your work and the original paper show that CRD loss with KL divergence for distillation is better than the others at this moment. But I found that it needs 2 memory buffers, which make it unfeasible when the dataset is really huge. So I wonder if there is a softer way to implement this.
Hope for your reply. Thanks.
The text was updated successfully, but these errors were encountered:
Hi @AberHu ,
Thank you very much for your work.
Now I am working on using CRD loss for training my face recognition model, while your work and the original paper show that CRD loss with KL divergence for distillation is better than the others at this moment. But I found that it needs 2 memory buffers, which make it unfeasible when the dataset is really huge. So I wonder if there is a softer way to implement this.
Hope for your reply. Thanks.
The text was updated successfully, but these errors were encountered: