-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
不理解添加L1规范怎样使得BN层的某些gama值变为0的? #80
Comments
因为每次更新是要执行weight -= weight.grad的,这个操作相当于若weight为正,每次更新多减一点,若为负,每次更新时少减一点,于是weight就有向0的趋势 |
我做了一些实验证明,加了加了L1范式后,如@ZongshenXie所说可以使γ更趋近于0(也就是数值较小的γ更多了),但是并不可以使γ值变为0。 |
代码里面实际上并不是L1,而是一个与γ正负相同的固定值,应该算是L0。另外,剪枝并不需要使γ严格等于0,而是让部分γ足够小,也就是让这部分channel对网络的影响变小,从而剪枝的时候把它们去掉后网络依然能够正常运行 |
L0的说法还是不太严谨的,这里援引netslim原作者的回答Eric-mingjie/network-slimming#31 |
#因此对于bn_module.weight.data(scale)为正数的,等于增大其梯度,按照梯度相反数更新,即bn_module.weight.data(scale)为正数的数值调小, |
bn_module.weight.grad.data.add_(s * torch.sign(bn_module.weight.data)) # L1
不理解将L1范式加到grad上后怎样来使得gama值变成0,以便于后续裁剪?麻烦知道的朋友告知一下,谢谢了
The text was updated successfully, but these errors were encountered: