You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this neat tool. I've found it quite useful, however, I noticed that the micro and macro average F1 scores seem to be calculated incorrectly in the evaluation script, and might be giving overly optimistic numbers:
micro_f1 = f1_score(correct_output.reshape(-1).numpy(), predictions.reshape(-1).numpy(), average='micro') Jump to code
f1_score supports "1d array-like, or label indicator array / sparse matrix". As I understand it, the above code flattens the matrix as if it was a multi-class setting, whereas the original matrix should be used in a multi-label setting, just transposed:
With the data I'm testing on I got micro F1 of 0.88 with your version, and 0.35 with mine. I was able to verify that the latter is correct with another software.
You may want to check your numbers in the paper as well.
Kind regards,
Samuel
The text was updated successfully, but these errors were encountered:
Samuel,
Thank you for pointing this out. In the original datasets considered for
evaluation, performance was calculated in the multi-class manner. I will be
pushing out an update that will address this (give option between the two)
alongside fixing another implementation bug in this public code release.
Thanks!
Andriy
On Tue, Mar 24, 2020, 5:12 AM Samuel Rönnqvist ***@***.***> wrote:
Hi!
Thanks for this neat tool. I've found it quite useful, however, I noticed
that the micro and macro average F1 scores seem to be calculated
incorrectly in the evaluation script, and might be giving overly optimistic
numbers:
micro_f1 = f1_score(correct_output.reshape(-1).numpy(),
predictions.reshape(-1).numpy(), average='micro')
Jump to code
<https://github.com/AndriyMulyar/bert_document_classification/blob/060e9034a8c41bfb34b8762c8e1612321015c076/bert_document_classification/document_bert.py#L265>
f1_score supports "1d array-like, or label indicator array / sparse
matrix". As I understand it, the above code flattens the matrix as if it
was a multi-class setting, whereas the original matrix should be used in a
multi-label setting, just transposed:
micro_f1 = f1_score(correct_output.T.numpy(), predictions.T.numpy(),
average='micro')
With the data I'm testing on I got micro F1 of 0.88 with your version, and
0.35 with mine. I was able to verify that the latter is correct with
another software.
You may want to check your numbers in the paper
<https://arxiv.org/pdf/1910.13664.pdf> as well.
Kind regards,
Samuel
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJ4TBVLQ24ICALNC33ENCDRJB2PTANCNFSM4LSPVQ3A>
.
Hi!
Thanks for this neat tool. I've found it quite useful, however, I noticed that the micro and macro average F1 scores seem to be calculated incorrectly in the evaluation script, and might be giving overly optimistic numbers:
micro_f1 = f1_score(correct_output.reshape(-1).numpy(), predictions.reshape(-1).numpy(), average='micro')
Jump to code
f1_score
supports "1d array-like, or label indicator array / sparse matrix". As I understand it, the above code flattens the matrix as if it was a multi-class setting, whereas the original matrix should be used in a multi-label setting, just transposed:micro_f1 = f1_score(correct_output.T.numpy(), predictions.T.numpy(), average='micro')
With the data I'm testing on I got micro F1 of 0.88 with your version, and 0.35 with mine. I was able to verify that the latter is correct with another software.
You may want to check your numbers in the paper as well.
Kind regards,
Samuel
The text was updated successfully, but these errors were encountered: