Evaluate function not right #6

airkid · 2019-03-01T09:05:29Z

https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/testing.py#L72
In this line, if I add a line of code before
assert len(arugments) == len(argumenst_)
There will be assert error.
I believe this is because in arugments there are golden arguments while only predict arugments in arguments_, which length will change dynamicly during traning.

The text was updated successfully, but these errors were encountered:

DorianKodelja · 2019-03-01T16:02:37Z

This computes the score wrong since if the model predict a wrong entity before all the good ones, the preds are not aligned and the score is 0, as shown in this example:
gold roles are [(3,5,11),(7,9,9)]
preds roles are [(0,2,2),(3,5,11),(7,9,9)]
first iteration: compare (3,5,11) and (0,2,2) -> fail
second iteration: compare (7,9,9) and (3,5,11) -> fail even though (3,5,11) was in the gold annotations.
Here is a functionning version that also generate a per class report (it requires tabulate)

calculate_sets_1.txt

mikelkl · 2019-03-07T10:04:41Z

Hi @airkid @DorianKodelja, I got with conclusion with you, according to DMCNN paper:

An argument is correctly classifiedd if its event subtype, offsets and argument role match those of any of the reference argument mentions

for item, item_ in zip(arguments, arguments_):

Above code in this repo does match the idea, so I replaced that line with:

ct += len(set(arguments) & set(arguments_))  # count any argument in golden
# for item, item_ in zip(arguments, arguments_):
#     if item[2] == item_[2]:
#         ct += 1

airkid · 2019-03-07T10:12:32Z

Hi @mikelkl , I believe this is a kind of right implementation of calculating F1 score in this task.
Have you reproduce the experiment? I can only reach F1 score < 0.4 in the test data.

mikelkl · 2019-03-07T11:26:51Z

Hi @airkid, I got slightly higher result, but it's on my own randomly splitting test set, hv no idea if it can efficively represent the paper result.

airkid · 2019-03-07T13:10:32Z

Hi @mikelkl, can you try on the data split update by author?
My result is still far away from the paper.

mikelkl · 2019-03-11T09:29:05Z

Hi @airkid, I'm afraid I cannot do that coz I hv no ACE2005 English data

carrie0307 · 2019-09-05T11:16:40Z

Hi @airkid Would you please tell me the result you got? I got only f1=0.64 in Trigger Classification.

rhythmswing · 2020-07-15T05:41:36Z

https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/testing.py#L72
In this line, if I add a line of code before
assert len(arugments) == len(argumenst_)
There will be assert error.
I believe this is because in arugments there are golden arguments while only predict arugments in arguments_, which length will change dynamicly during traning.

Hi,

If you've tried their code, would you tell me your reproduced results on trigger detection and argument detection?

airkid mentioned this issue Mar 7, 2019

embeddingMatrix is never passed when building model #5

Open

carrie0307 mentioned this issue Sep 6, 2019

Some Questions about EMNLP2018 JMEE交流 airkid/MatchPyramid_torch#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate function not right #6

Evaluate function not right #6

airkid commented Mar 1, 2019

DorianKodelja commented Mar 1, 2019 •

edited

Loading

mikelkl commented Mar 7, 2019

airkid commented Mar 7, 2019

mikelkl commented Mar 7, 2019

airkid commented Mar 7, 2019

mikelkl commented Mar 11, 2019

carrie0307 commented Sep 5, 2019

rhythmswing commented Jul 15, 2020

Evaluate function not right #6

Evaluate function not right #6

Comments

airkid commented Mar 1, 2019

DorianKodelja commented Mar 1, 2019 • edited Loading

mikelkl commented Mar 7, 2019

airkid commented Mar 7, 2019

mikelkl commented Mar 7, 2019

airkid commented Mar 7, 2019

mikelkl commented Mar 11, 2019

carrie0307 commented Sep 5, 2019

rhythmswing commented Jul 15, 2020

DorianKodelja commented Mar 1, 2019 •

edited

Loading