You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The same count_normalization function is used for every norm-esque module but batchnorms store an estimate mean and stdev, while layernorms calculate them at inference time. Shouldn't layernorms account for the cost of evaluating mean and stdev? The difference is pretty significant:
The mean is n flops, stdev is 2n more flops? and thats before the rest of the norm module which is another 2n.
Is there a reason layernorms should be estimateable as only 2n flops by re-using batchnorm's estimate?
The text was updated successfully, but these errors were encountered:
pytorch-OpCounter/thop/profile.py
Line 32 in 43c064a
The same count_normalization function is used for every norm-esque module but batchnorms store an estimate mean and stdev, while layernorms calculate them at inference time. Shouldn't layernorms account for the cost of evaluating mean and stdev? The difference is pretty significant:
The mean is n flops, stdev is 2n more flops? and thats before the rest of the norm module which is another 2n.
Is there a reason layernorms should be estimateable as only 2n flops by re-using batchnorm's estimate?
The text was updated successfully, but these errors were encountered: