true_which_overlapped_with_pred does not get updated properly (1) #40

ivyleavedtoadflax · 2021-06-25T15:39:24Z

giuliabaldini · 2022-11-15T10:06:08Z

Hey, thank you very much for the package! We have been using it for a project and this is definitely a problem, also because it makes the results not deterministic when the annotations are passed in a different order.

This happens when multiple entities from the ground truth are predicted as one in the prediction, as correctly identified in the issue linked above.
We have built a small MWE to show the problem:

from nervaluate import Evaluator
import pandas as pd

gt1 = [
    [
        {"start": 0, "end": 12, "label": "A"},
        {"start": 14, "end": 17, "label": "B"},
    ]
]
gt2 = [
    [
        {"start": 14, "end": 17, "label": "B"},
        {"start": 0, "end": 12, "label": "A"},
    ]
]

pred = [
    [
        {"start": 0, "end": 17, "label": "A"},
    ]
]
classes = ["A", "B"]
if __name__ == "__main__":
    for i in range(2):
        print(f"Run {i}")
        if i == 0:
            gt = gt1
        else:
            gt = gt2
        evaluator = Evaluator(gt, pred, tags=classes)
        results, results_by_tag = evaluator.evaluate()
        df_results = pd.DataFrame(results).T
        int_cols = [
            c for c in df_results.columns if c not in ["precision", "recall", "f1"]
        ]
        df_results[int_cols] = df_results[int_cols].astype(int)
        df_results.sort_values("f1")
        print(df_results)
        print()

Which gives the following output

Run 0
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall        f1
ent_type        1          0        0       1         0         2       1        1.0    0.50  0.666667
partial         0          0        1       1         0         2       1        0.5    0.25  0.333333
strict          0          1        0       1         0         2       1        0.0    0.00  0.000000
exact           0          1        0       1         0         2       1        0.0    0.00  0.000000

Run 1
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall        f1
ent_type        0          1        0       1         0         2       1        0.0    0.00  0.000000
partial         0          0        1       1         0         2       1        0.5    0.25  0.333333
strict          0          1        0       1         0         2       1        0.0    0.00  0.000000
exact           0          1        0       1         0         2       1        0.0    0.00  0.000000

If the entity overlap, we would expect that for ent_type, both correct and incorrect are 1, but since only one of the predicted entites gets compared with the ground truth, depending on the order the first entity is marked as correct or incorrect, and the other entity is then marked as missing.

Thank you very much for your time!

Best,
Giulia

@karzideh, @jantrienes

ivyleavedtoadflax · 2023-01-12T11:52:13Z

hey @giuliabaldini, thanks for taking the time to look at this. This is indeed a problem that needs fixing. If you have time to put together a PR for it that would be super helpful. In the meantime, I'll add it to our backlog.

coffepowered · 2023-09-04T16:03:12Z

Hey @ivyleavedtoadflax , I see this is still an issue.
Do you have any suggestions to fix it?

infopz · 2023-09-27T08:22:19Z

Hello everyone,
I was trying to solve the problem but I'm not sure on what the desired behavior is.
I found a way to take into account both true entities, counting the first time as correct and the second one as incorrect.

Based on @giuliabaldini example, this are the results:

Run 0
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall   f1
ent_type        1          1        0       0         0         2       2        0.5     0.5  0.5
partial         0          0        2       0         0         2       2        0.5     0.5  0.5
strict          0          2        0       0         0         2       2        0.0     0.0  0.0
exact           0          2        0       0         0         2       2        0.0     0.0  0.0

Run 1
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall   f1
ent_type        1          1        0       0         0         2       2        0.5     0.5  0.5
partial         0          0        2       0         0         2       2        0.5     0.5  0.5
strict          0          2        0       0         0         2       2        0.0     0.0  0.0
exact           0          2        0       0         0         2       2        0.0     0.0  0.0

Is this solution accettable? In that case I will send a PR

ivyleavedtoadflax · 2023-09-27T08:53:03Z

Hi @coffepowered thanks for your comment. Unfortunately we're not actively working on nervaluate right now. @infopz if you can put in a PR we will review - thanks!

Solved issue #40

ivyleavedtoadflax added help wanted Extra attention is needed bug Something isn't working labels Jun 25, 2021

ivyleavedtoadflax assigned pdan93 Jan 23, 2023

ivyleavedtoadflax changed the title ~~true_which_overlapped_with_pred does not get updated properly~~ true_which_overlapped_with_pred does not get updated properly (1) Jan 23, 2023

infopz pushed a commit to infopz/nervaluate that referenced this issue Sep 27, 2023

Solved issue MantisAI#40

ddb3845

infopz mentioned this issue Sep 27, 2023

Solved issue #40 #70

Merged

infopz pushed a commit to infopz/nervaluate that referenced this issue Sep 28, 2023

Added test for issue MantisAI#40

787e957

ivyleavedtoadflax mentioned this issue Sep 28, 2023

fix/failing CICD #71

Merged

ivyleavedtoadflax added a commit that referenced this issue Oct 23, 2023

Merge pull request #70 from infopz/main

0978d89

Solved issue #40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

true_which_overlapped_with_pred does not get updated properly (1) #40

true_which_overlapped_with_pred does not get updated properly (1) #40

ivyleavedtoadflax commented Jun 25, 2021

giuliabaldini commented Nov 15, 2022

ivyleavedtoadflax commented Jan 12, 2023

coffepowered commented Sep 4, 2023

infopz commented Sep 27, 2023

ivyleavedtoadflax commented Sep 27, 2023

true_which_overlapped_with_pred does not get updated properly (1) #40

true_which_overlapped_with_pred does not get updated properly (1) #40

Comments

ivyleavedtoadflax commented Jun 25, 2021

giuliabaldini commented Nov 15, 2022

ivyleavedtoadflax commented Jan 12, 2023

coffepowered commented Sep 4, 2023

infopz commented Sep 27, 2023

ivyleavedtoadflax commented Sep 27, 2023