-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue about MultiheadAttention #1
Comments
Thanks for your reminder. Because our tokens are sparse, only 29 to 98, the MultiheadAttention actually accounts a very small part of our model. We utilize another toolkits (fvcore) to count the params and flops, the flops are 6.123G, 5.173G and 3.988G for 98 landmarks, 68 landmarks and 29 landmarks respectively. I will update the results. Moreover, we find the main issue that affect the the inference speed is the interpolation code is not efficient. I modify the interpolation code yesterday, the inference speed is improved 1.5×. I will update the code after testing. Thank you for finding this issue. |
The nice work breaks the new record in face alignment,and I want to cite your work in my paper. I have calculated the params and flops on WFLW are 6.110G and 13.134M for the 6-layer model, as well as 8.138G and 19.445M for the 12-layer model. Could you check its correctness or give detailed information for 6-layer and 12 layer models. |
I think your results are correct. Could you please give me your email? I will send the details to through email and can discuss more details via Wechat. |
你好,刚给论文提供的邮箱发了邮件,但是邮箱没有响应,不知道你收到没有 |
Hi, I want to cite your result about 12-layer model on the WFLW subsets. |
NME 4.128 6.988 4.368 4.023 4.032 5.005 4.790 |
Thanks! |
Hi, Great Work in face alignment!
However, i have a question about the params and flops of the paper.
I have tried to run your code to count the params and flops for 6-layer and 12-layer model.
And I guess your result is coming from the
thop
tool, but it exists shortage in MultiheadAttention, which accounts major part of Transformer. So your result in the paper may be wrong.Could you check the issue, and update the real flops if indeed exists the error.
The text was updated successfully, but these errors were encountered: