Miss attention FLOPS? #14

YeDeming · 2020-06-25T09:17:56Z

Hi,

I found in MultiHeadedAttention, thop only count the FLOPS of linear layer, missing the attention operation.

autoliuweijie · 2020-06-28T04:27:50Z

Hello! Could you describe this issue in more detail? E.g, how you find that the attention operation is missed?

In our run_fastbert.py, we use the following code to obtain the total FLOPs of the model with thop.profile:

 # Get FLOPs at this batch
  inputs = (input_ids_batch, label_ids_batch, mask_ids_batch, fast_mode)
  flops, params = profile(model, inputs, verbose=False)
  total_flops += flops

And, in our previous experiment, we measured the FLOPs of self-attention operation, which is about 603.0M, and that of FeedForward layer is about 1207.9M.

YeDeming · 2020-06-28T04:32:31Z

When testing on the multiheadattention


from uer.layers.multi_headed_attn import MultiHeadedAttention
   
encoder = MultiHeadedAttention(768, 12, 0.0)
l = 512
key = torch.Tensor(1, l, 768)
mask = torch.Tensor(1, l, l)
mask = mask.unsqueeze(1)

inputs = (key, key, key, mask)
macs, params = profile(encoder, inputs, verbose=False)
print (macs)

if you delete the matmul operation in the code, the macs will be the same, e.g., delete the following two lines in multi_headed_attn.py

scores = torch.matmul(query, key.transpose(-2, -1))
....
output = unshape(torch.matmul(probs, value))

autoliuweijie · 2020-06-28T06:26:41Z

Thank you for your testing!
I will analyze it further, and show my results as soon as possible.

autoliuweijie · 2020-06-28T09:02:59Z

After testing, we found that thop.profile does not calculate FLOPs for torch.matmul() operation.

So, the FLOPs we obtained miss the torch.matmul parts.

References: https://discuss.pytorch.org/t/get-the-matmul-operations-in-a-net/61058

This is a mistake in our work, however, this does not affect the conclusion of this paper, because the speedup is unchanged, and the FLOPs of torch.matmul is only a small part.

Thank you for finding this issue.

Jiahao-UTS mentioned this issue Mar 24, 2022

issue about MultiheadAttention Jiahao-UTS/SLPT-master#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miss attention FLOPS? #14

Miss attention FLOPS? #14

YeDeming commented Jun 25, 2020

autoliuweijie commented Jun 28, 2020

YeDeming commented Jun 28, 2020 •

edited

Loading

autoliuweijie commented Jun 28, 2020

autoliuweijie commented Jun 28, 2020 •

edited

Loading

Miss attention FLOPS? #14

Miss attention FLOPS? #14

Comments

YeDeming commented Jun 25, 2020

autoliuweijie commented Jun 28, 2020

YeDeming commented Jun 28, 2020 • edited Loading

autoliuweijie commented Jun 28, 2020

autoliuweijie commented Jun 28, 2020 • edited Loading

YeDeming commented Jun 28, 2020 •

edited

Loading

autoliuweijie commented Jun 28, 2020 •

edited

Loading