You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An excellent work! I've several questions.
1, Is the equation (22) in paper https://arxiv.org/abs/1903.10520 right? I think the dot on H should be placed in the left of equation.
2, Are the equation (25) and (36) different? I think the square of l2norm of W{hat} (equation 36) is input channels instead of 1.
3, Can the WS be applied to fc layer, for example, the final linear layer of a classification model.
4, In WS, I think we should not focus on the gradients of W, because they doesn't affect the forward pass and backpropagation directly. Instead, we should focus on the gradients on X, which are directly affected by W{hat}. This might be the reason that WS works.
Sorry to bother you.
The text was updated successfully, but these errors were encountered:
zhujiagang
changed the title
Possible typo in the paper and
Possible typo in the paper and some other questions
Oct 21, 2020
An excellent work! I've several questions.
1, Is the equation (22) in paper https://arxiv.org/abs/1903.10520 right? I think the dot on H should be placed in the left of equation.
2, Are the equation (25) and (36) different? I think the square of l2norm of W{hat} (equation 36) is input channels instead of 1.
3, Can the WS be applied to fc layer, for example, the final linear layer of a classification model.
4, In WS, I think we should not focus on the gradients of W, because they doesn't affect the forward pass and backpropagation directly. Instead, we should focus on the gradients on X, which are directly affected by W{hat}. This might be the reason that WS works.
Sorry to bother you.
The text was updated successfully, but these errors were encountered: