-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miss attention FLOPS? #14
Comments
Hello! Could you describe this issue in more detail? E.g, how you find that the attention operation is missed? In our # Get FLOPs at this batch
inputs = (input_ids_batch, label_ids_batch, mask_ids_batch, fast_mode)
flops, params = profile(model, inputs, verbose=False)
total_flops += flops And, in our previous experiment, we measured the FLOPs of self-attention operation, which is about 603.0M, and that of FeedForward layer is about 1207.9M. |
When testing on the multiheadattention
if you delete the matmul operation in the code, the macs will be the same, e.g., delete the following two lines in multi_headed_attn.py
|
Thank you for your testing! |
After testing, we found that So, the FLOPs we obtained miss the References: https://discuss.pytorch.org/t/get-the-matmul-operations-in-a-net/61058 This is a mistake in our work, however, this does not affect the conclusion of this paper, because the speedup is unchanged, and the FLOPs of Thank you for finding this issue. |
Hi,
I found in MultiHeadedAttention, thop only count the FLOPS of linear layer, missing the attention operation.
The text was updated successfully, but these errors were encountered: