Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

true_which_overlapped_with_pred does not get updated properly (1) #40

Open
ivyleavedtoadflax opened this issue Jun 25, 2021 · 5 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ivyleavedtoadflax
Copy link
Collaborator

davidsbatista/NER-Evaluation#17

@ivyleavedtoadflax ivyleavedtoadflax added help wanted Extra attention is needed bug Something isn't working labels Jun 25, 2021
@giuliabaldini
Copy link

Hey, thank you very much for the package! We have been using it for a project and this is definitely a problem, also because it makes the results not deterministic when the annotations are passed in a different order.

This happens when multiple entities from the ground truth are predicted as one in the prediction, as correctly identified in the issue linked above.
We have built a small MWE to show the problem:

from nervaluate import Evaluator
import pandas as pd

gt1 = [
    [
        {"start": 0, "end": 12, "label": "A"},
        {"start": 14, "end": 17, "label": "B"},
    ]
]
gt2 = [
    [
        {"start": 14, "end": 17, "label": "B"},
        {"start": 0, "end": 12, "label": "A"},
    ]
]

pred = [
    [
        {"start": 0, "end": 17, "label": "A"},
    ]
]
classes = ["A", "B"]
if __name__ == "__main__":
    for i in range(2):
        print(f"Run {i}")
        if i == 0:
            gt = gt1
        else:
            gt = gt2
        evaluator = Evaluator(gt, pred, tags=classes)
        results, results_by_tag = evaluator.evaluate()
        df_results = pd.DataFrame(results).T
        int_cols = [
            c for c in df_results.columns if c not in ["precision", "recall", "f1"]
        ]
        df_results[int_cols] = df_results[int_cols].astype(int)
        df_results.sort_values("f1")
        print(df_results)
        print()

Which gives the following output

Run 0
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall        f1
ent_type        1          0        0       1         0         2       1        1.0    0.50  0.666667
partial         0          0        1       1         0         2       1        0.5    0.25  0.333333
strict          0          1        0       1         0         2       1        0.0    0.00  0.000000
exact           0          1        0       1         0         2       1        0.0    0.00  0.000000

Run 1
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall        f1
ent_type        0          1        0       1         0         2       1        0.0    0.00  0.000000
partial         0          0        1       1         0         2       1        0.5    0.25  0.333333
strict          0          1        0       1         0         2       1        0.0    0.00  0.000000
exact           0          1        0       1         0         2       1        0.0    0.00  0.000000

If the entity overlap, we would expect that for ent_type, both correct and incorrect are 1, but since only one of the predicted entites gets compared with the ground truth, depending on the order the first entity is marked as correct or incorrect, and the other entity is then marked as missing.

Thank you very much for your time!

Best,
Giulia

@karzideh, @jantrienes

@ivyleavedtoadflax
Copy link
Collaborator Author

hey @giuliabaldini, thanks for taking the time to look at this. This is indeed a problem that needs fixing. If you have time to put together a PR for it that would be super helpful. In the meantime, I'll add it to our backlog.

@ivyleavedtoadflax ivyleavedtoadflax changed the title true_which_overlapped_with_pred does not get updated properly true_which_overlapped_with_pred does not get updated properly (1) Jan 23, 2023
@coffepowered
Copy link

Hey @ivyleavedtoadflax , I see this is still an issue.
Do you have any suggestions to fix it?

@infopz
Copy link
Contributor

infopz commented Sep 27, 2023

Hello everyone,
I was trying to solve the problem but I'm not sure on what the desired behavior is.
I found a way to take into account both true entities, counting the first time as correct and the second one as incorrect.

Based on @giuliabaldini example, this are the results:

Run 0
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall   f1
ent_type        1          1        0       0         0         2       2        0.5     0.5  0.5
partial         0          0        2       0         0         2       2        0.5     0.5  0.5
strict          0          2        0       0         0         2       2        0.0     0.0  0.0
exact           0          2        0       0         0         2       2        0.0     0.0  0.0

Run 1
          correct  incorrect  partial  missed  spurious  possible  actual  precision  recall   f1
ent_type        1          1        0       0         0         2       2        0.5     0.5  0.5
partial         0          0        2       0         0         2       2        0.5     0.5  0.5
strict          0          2        0       0         0         2       2        0.0     0.0  0.0
exact           0          2        0       0         0         2       2        0.0     0.0  0.0

Is this solution accettable? In that case I will send a PR

@ivyleavedtoadflax
Copy link
Collaborator Author

Hi @coffepowered thanks for your comment. Unfortunately we're not actively working on nervaluate right now. @infopz if you can put in a PR we will review - thanks!

infopz pushed a commit to infopz/nervaluate that referenced this issue Sep 27, 2023
@infopz infopz mentioned this issue Sep 27, 2023
infopz pushed a commit to infopz/nervaluate that referenced this issue Sep 28, 2023
ivyleavedtoadflax added a commit that referenced this issue Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants