Implementation of the common evaluation metrics including MUC, B-cubed, CEAFe, and LEA for both CoNLL and ARRAU datasets. See the paper Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric.
This evaluation tool requires numpy and scipy packages.
Basic usage with CoNLL files:
$ python scorer.py key system
key
and system
are the files with gold coreference and system output, respectively.
For more details, refer to ARRAU README for evaluations of the ARRAU files and CoNLL README for CoNLL evaluations.
Run tests with python3 -m pytest unittests.py
If you use this code in your work, please cite the paper:
@InProceedings{moosavi2019minimum,
author = { Nafise Sadat Moosavi, Leo Born, Massimo Poesio and Michael Strube},
title = {Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection},
year = {2019},
booktitle = {Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
address = {Florence, Italy},
}
This code was written by @ns-moosavi. Some parts are borrowed from https://github.com/clarkkev/deep-coref/blob/master/evaluation.py
The test suite is taken from https://github.com/conll/reference-coreference-scorers/
Mention evaluation and the test suite are added by @andreasvc.
Parsing CoNLL files is developed by Leo Born.