Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro/Macro F1 score calculation is over-optimistic #9

Open
sronnqvist opened this issue Mar 24, 2020 · 1 comment
Open

Micro/Macro F1 score calculation is over-optimistic #9

sronnqvist opened this issue Mar 24, 2020 · 1 comment

Comments

@sronnqvist
Copy link

Hi!

Thanks for this neat tool. I've found it quite useful, however, I noticed that the micro and macro average F1 scores seem to be calculated incorrectly in the evaluation script, and might be giving overly optimistic numbers:

micro_f1 = f1_score(correct_output.reshape(-1).numpy(), predictions.reshape(-1).numpy(), average='micro')
Jump to code

f1_score supports "1d array-like, or label indicator array / sparse matrix". As I understand it, the above code flattens the matrix as if it was a multi-class setting, whereas the original matrix should be used in a multi-label setting, just transposed:

micro_f1 = f1_score(correct_output.T.numpy(), predictions.T.numpy(), average='micro')

With the data I'm testing on I got micro F1 of 0.88 with your version, and 0.35 with mine. I was able to verify that the latter is correct with another software.

You may want to check your numbers in the paper as well.

Kind regards,
Samuel

@AndriyMulyar
Copy link
Owner

AndriyMulyar commented Mar 24, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants