You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, the reason for normalizing the gradients is to determine their relative magnitudes. If we normalize them with softmax for each individual example, the ratios among the three features become similar. Therefore, it is necessary to define a norm_size for this purpose. However, it's worth exploring other normalization methods, as some of them might not require setting a norm_size parameter.
What is this parameter (args.norm_size ) used for? In line #73, after adding up the three contribution values, why is this norm_size added.
The text was updated successfully, but these errors were encountered: