-
I am working on adding a L1-norm loss term during training, aiming to obtain sparse network weights. Could you give me some advice? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Since we calculated loss in the For example,
Maybe you can
|
Beta Was this translation helpful? Give feedback.
-
It has been answered @rogercmq |
Beta Was this translation helpful? Give feedback.
-
Problem solved. Thx! |
Beta Was this translation helpful? Give feedback.
Since we calculated loss in the
HEAD
, I think the best way is to modify the optimizer.....For example,
torch.optim.SGD
directly computes the gradient of L2 regularization (1/2 w^2 ---> w):Maybe you can