-
Notifications
You must be signed in to change notification settings - Fork 460
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
1,174 additions
and
3 deletions.
There are no files selected for viewing
389 changes: 389 additions & 0 deletions
389
...grobid-0.7.0-SNAPSHOT-Glutton-DeLFT-WAPITI-MIXED-BidLSTM-CRF-FEATURES-CITATION-06.07.2021
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,389 @@ | ||
PDF processing 100% │████████████████│ 1943/1943 (0:13:25 / 0:00:00) | ||
|
||
-------------> GROBID failed on 0 PDF | ||
|
||
1943 PDF files processed in 816.863 seconds, 0.4204132784354092 seconds per PDF file | ||
|
||
Evaluation header 100% │█████████████│ 1943/1943 (0:01:25 / 0:00:00) | ||
|
||
|
||
Evaluation citation 100% │███████████│ 1943/1943 (0:13:21 / 0:00:00) | ||
|
||
Evaluation full text 100% │██████████│ 1943/1943 (0:00:29 / 0:00:00) | ||
|
||
Evaluation metrics produced in 916.228 seconds | ||
|
||
======= Header metadata ======= | ||
|
||
Evaluation on 1943 random PDF files out of 1943 PDF (ratio 1.0). | ||
|
||
======= Strict Matching ======= (exact matches) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
abstract 82.3 16.11 15.8 15.95 1911 | ||
authors 98.44 93.07 92.68 92.88 1941 | ||
first_author 99.07 96.07 95.67 95.87 1941 | ||
keywords 94.27 68.26 64.06 66.09 1380 | ||
title 97.16 86.84 86.62 86.73 1943 | ||
|
||
all (micro avg.) 94.25 72.71 71.58 72.14 9116 | ||
all (macro avg.) 94.25 72.07 70.97 71.5 9116 | ||
|
||
|
||
======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
abstract 91.47 60.91 59.76 60.33 1911 | ||
authors 98.54 93.53 93.15 93.34 1941 | ||
first_author 99.09 96.17 95.78 95.97 1941 | ||
keywords 95.44 76.53 71.81 74.09 1380 | ||
title 98.83 94.74 94.49 94.61 1943 | ||
|
||
all (micro avg.) 96.67 85.09 83.76 84.42 9116 | ||
all (macro avg.) 96.67 84.37 83 83.67 9116 | ||
|
||
|
||
==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
abstract 97.12 88.48 86.81 87.64 1911 | ||
authors 99.15 96.43 96.03 96.23 1941 | ||
first_author 99.16 96.48 96.08 96.28 1941 | ||
keywords 96.78 86.02 80.72 83.29 1380 | ||
title 99.49 97.83 97.58 97.71 1943 | ||
|
||
all (micro avg.) 98.34 93.58 92.12 92.85 9116 | ||
all (macro avg.) 98.34 93.05 91.45 92.23 9116 | ||
|
||
|
||
= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
abstract 96.33 84.64 83.05 83.84 1911 | ||
authors 98.84 94.98 94.59 94.79 1941 | ||
first_author 99.07 96.07 95.67 95.87 1941 | ||
keywords 96.2 81.93 76.88 79.33 1380 | ||
title 99.39 97.37 97.12 97.24 1943 | ||
|
||
all (micro avg.) 97.97 91.69 90.26 90.97 9116 | ||
all (macro avg.) 97.97 91 89.46 90.21 9116 | ||
|
||
===== Instance-level results ===== | ||
|
||
Total expected instances: 1943 | ||
Total correct instances: 218 (strict) | ||
Total correct instances: 869 (soft) | ||
Total correct instances: 1365 (Levenshtein) | ||
Total correct instances: 1256 (ObservedRatcliffObershelp) | ||
|
||
Instance-level recall: 11.22 (strict) | ||
Instance-level recall: 44.72 (soft) | ||
Instance-level recall: 70.25 (Levenshtein) | ||
Instance-level recall: 64.64 (RatcliffObershelp) | ||
|
||
======= Citation metadata ======= | ||
|
||
Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0). | ||
|
||
======= Strict Matching ======= (exact matches) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
authors 97.43 82.47 75.38 78.77 85778 | ||
date 99.15 94.46 82.98 88.35 87067 | ||
first_author 98.37 89.11 81.43 85.09 85778 | ||
inTitle 96.01 72.17 70.95 71.56 81007 | ||
issue 99.59 89.04 83.14 85.99 16635 | ||
page 98.93 95.94 85.15 90.22 80501 | ||
title 97.07 79 74.48 76.67 80736 | ||
volume 99.43 95.92 89.01 92.34 80067 | ||
|
||
all (micro avg.) 98.25 86.86 79.99 83.29 597569 | ||
all (macro avg.) 98.25 87.26 80.31 83.62 597569 | ||
|
||
|
||
======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
authors 97.51 83.01 75.87 79.28 85778 | ||
date 99.15 94.46 82.98 88.35 87067 | ||
first_author 98.39 89.3 81.6 85.28 85778 | ||
inTitle 97.64 83.54 82.13 82.83 81007 | ||
issue 99.59 89.04 83.14 85.99 16635 | ||
page 98.93 95.94 85.15 90.22 80501 | ||
title 98.64 90.45 85.28 87.79 80736 | ||
volume 99.43 95.92 89.01 92.34 80067 | ||
|
||
all (micro avg.) 98.66 90.2 83.06 86.48 597569 | ||
all (macro avg.) 98.66 90.21 83.15 86.51 597569 | ||
|
||
|
||
==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
authors 98.26 88.31 80.72 84.35 85778 | ||
date 99.15 94.46 82.98 88.35 87067 | ||
first_author 98.42 89.5 81.79 85.47 85778 | ||
inTitle 97.83 84.84 83.4 84.11 81007 | ||
issue 99.59 89.04 83.14 85.99 16635 | ||
page 98.93 95.94 85.15 90.22 80501 | ||
title 98.96 92.83 87.52 90.1 80736 | ||
volume 99.43 95.92 89.01 92.34 80067 | ||
|
||
all (micro avg.) 98.82 91.5 84.26 87.73 597569 | ||
all (macro avg.) 98.82 91.36 84.21 87.62 597569 | ||
|
||
|
||
= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
authors 97.84 85.37 78.03 81.54 85778 | ||
date 99.15 94.46 82.98 88.35 87067 | ||
first_author 98.37 89.13 81.44 85.11 85778 | ||
inTitle 97.45 82.17 80.78 81.47 81007 | ||
issue 99.59 89.04 83.14 85.99 16635 | ||
page 98.93 95.94 85.15 90.22 80501 | ||
title 98.9 92.38 87.1 89.66 80736 | ||
volume 99.43 95.92 89.01 92.34 80067 | ||
|
||
all (micro avg.) 98.71 90.58 83.41 86.85 597569 | ||
all (macro avg.) 98.71 90.55 83.45 86.83 597569 | ||
|
||
===== Instance-level results ===== | ||
|
||
Total expected instances: 90125 | ||
Total extracted instances: 87994 | ||
Total correct instances: 39070 (strict) | ||
Total correct instances: 50916 (soft) | ||
Total correct instances: 55618 (Levenshtein) | ||
Total correct instances: 52284 (RatcliffObershelp) | ||
|
||
Instance-level precision: 44.4 (strict) | ||
Instance-level precision: 57.86 (soft) | ||
Instance-level precision: 63.21 (Levenshtein) | ||
Instance-level precision: 59.42 (RatcliffObershelp) | ||
|
||
Instance-level recall: 43.35 (strict) | ||
Instance-level recall: 56.49 (soft) | ||
Instance-level recall: 61.71 (Levenshtein) | ||
Instance-level recall: 58.01 (RatcliffObershelp) | ||
|
||
Instance-level f-score: 43.87 (strict) | ||
Instance-level f-score: 57.17 (soft) | ||
Instance-level f-score: 62.45 (Levenshtein) | ||
Instance-level f-score: 58.71 (RatcliffObershelp) | ||
|
||
Matching 1 : 67183 | ||
|
||
Matching 2 : 4042 | ||
|
||
Matching 3 : 2332 | ||
|
||
Matching 4 : 739 | ||
|
||
Total matches : 74296 | ||
|
||
======= Citation context resolution ======= | ||
|
||
Total expected references: 90125 - 46.38 references per article | ||
Total predicted references: 87994 - 45.29 references per article | ||
|
||
Total expected citation contexts: 139835 - 71.97 citation contexts per article | ||
Total predicted citation contexts: 121136 - 62.34 citation contexts per article | ||
|
||
Total correct predicted citation contexts: 100034 - 51.48 citation contexts per article | ||
Total wrong predicted citation contexts: 21102 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM) | ||
|
||
Precision citation contexts: 82.58 | ||
Recall citation contexts: 71.54 | ||
fscore citation contexts: 76.66 | ||
|
||
======= Fulltext structures ======= | ||
|
||
Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0). | ||
|
||
======= Strict Matching ======= (exact matches) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
figure_title 96.65 30.89 25.49 27.93 7058 | ||
reference_citation 58.96 57.33 59.18 58.24 134196 | ||
reference_figure 95 64.42 63.15 63.78 19330 | ||
reference_table 99.11 82.75 83.81 83.28 7327 | ||
section_title 94.77 77.06 67.58 72.01 27619 | ||
table_title 98.82 57.17 53.12 55.07 3784 | ||
|
||
all (micro avg.) 90.55 60.59 60.32 60.46 199314 | ||
all (macro avg.) 90.55 61.6 58.72 60.05 199314 | ||
|
||
|
||
======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches) | ||
|
||
===== Field-level results ===== | ||
|
||
label accuracy precision recall f1 support | ||
|
||
figure_title 98.63 79.17 65.33 71.59 7058 | ||
reference_citation 61.55 61.41 63.39 62.38 134196 | ||
reference_figure 94.89 65 63.71 64.35 19330 | ||
reference_table 99.08 82.9 83.96 83.43 7327 | ||
section_title 95.48 81.97 71.88 76.59 27619 | ||
table_title 99.42 81.85 76.06 78.85 3784 | ||
|
||
all (micro avg.) 91.51 65.95 65.66 65.8 199314 | ||
all (macro avg.) 91.51 75.38 70.72 72.86 199314 | ||
|
||
|
||
************************************************************************************ | ||
COUNTER: org.grobid.core.engines.counters.TableRejectionCounters | ||
************************************************************************************ | ||
------------------------------------------------------------------------------------ | ||
CANNOT_PARSE_LABEL_TO_INT: 136 | ||
CONTENT_SIZE_TOO_SMALL: 80 | ||
CONTENT_WIDTH_TOO_SMALL: 15 | ||
EMPTY_LABEL_OR_HEADER_OR_CONTENT: 1674 | ||
HEADER_NOT_STARTS_WITH_TABLE_WORD: 140 | ||
HEADER_NOT_CONSECUTIVE: 943 | ||
HEADER_AND_CONTENT_DIFFERENT_PAGES: 11 | ||
HEADER_AND_CONTENT_INTERSECT: 555 | ||
FEW_TOKENS_IN_HEADER: 1 | ||
==================================================================================== | ||
|
||
************************************************************************************ | ||
COUNTER: org.grobid.core.engines.counters.ReferenceMarkerMatcherCounters | ||
************************************************************************************ | ||
------------------------------------------------------------------------------------ | ||
UNMATCHED_REF_MARKERS: 7568 | ||
MATCHED_REF_MARKERS_AFTER_POST_FILTERING: 2781 | ||
STYLE_AUTHORS: 37248 | ||
STYLE_NUMBERED: 53847 | ||
MANY_CANDIDATES: 4044 | ||
MANY_CANDIDATES_AFTER_POST_FILTERING: 526 | ||
NO_CANDIDATES: 16297 | ||
INPUT_REF_STRINGS_CNT: 93426 | ||
MATCHED_REF_MARKERS: 121136 | ||
NO_CANDIDATES_AFTER_POST_FILTERING: 598 | ||
STYLE_OTHER: 2331 | ||
==================================================================================== | ||
|
||
************************************************************************************ | ||
COUNTER: org.grobid.core.engines.counters.FigureCounters | ||
************************************************************************************ | ||
------------------------------------------------------------------------------------ | ||
SKIPPED_BAD_STANDALONE_FIGURES: 534 | ||
SKIPPED_DUE_TO_MISMATCH_OF_CAPTIONS_AND_VECTOR_AND_BITMAP_GRAPHICS: 3 | ||
SKIPPED_SMALL_STANDALONE_FIGURES: 436 | ||
SKIPPED_BIG_STANDALONE_FIGURES: 98 | ||
==================================================================================== | ||
|
||
************************************************************************************ | ||
COUNTER: org.grobid.core.engines.label.TaggingLabelImpl | ||
************************************************************************************ | ||
------------------------------------------------------------------------------------ | ||
HEADER_DOCTYPE: 1338 | ||
CITATION_TITLE: 83756 | ||
HEADER_DATE: 1022 | ||
HEADER_KEYWORD: 1372 | ||
NAME-HEADER_MIDDLENAME: 5762 | ||
TABLE_FIGDESC: 4118 | ||
NAME-HEADER_SURNAME: 14044 | ||
NAME-CITATION_OTHER: 439450 | ||
CITATION_BOOKTITLE: 3967 | ||
HEADER_FUNDING: 76 | ||
HEADER_ADDRESS: 6135 | ||
HEADER_AFFILIATION: 6273 | ||
CITATION_NOTE: 3524 | ||
FULLTEXT_CITATION_MARKER: 182767 | ||
TABLE_NOTE: 2666 | ||
HEADER_EMAIL: 2184 | ||
FULLTEXT_TABLE_MARKER: 14699 | ||
CITATION_WEB: 1343 | ||
HEADER_GROUP: 5 | ||
TABLE_LABEL: 3321 | ||
FULLTEXT_SECTION: 51375 | ||
NAME-HEADER_FORENAME: 14161 | ||
TABLE_CONTENT: 4800 | ||
CITATION_COLLABORATION: 180 | ||
HEADER_MEETING: 25 | ||
CITATION_ISSUE: 16839 | ||
HEADER_EDITOR: 136 | ||
CITATION_SERIES: 66 | ||
CITATION_JOURNAL: 78794 | ||
NAME-CITATION_SURNAME: 333709 | ||
TABLE_FIGURE_HEAD: 4735 | ||
FULLTEXT_EQUATION_MARKER: 1626 | ||
CITATION_OTHER: 446085 | ||
FULLTEXT_FIGURE_MARKER: 37781 | ||
HEADER_TITLE: 2208 | ||
CITATION_TECH: 384 | ||
FIGURE_CONTENT: 2619 | ||
FIGURE_LABEL: 5950 | ||
FULLTEXT_EQUATION_LABEL: 1891 | ||
HEADER_OTHER: 11202 | ||
FULLTEXT_EQUATION: 4359 | ||
TABLE_OTHER: 1 | ||
CITATION_DATE: 85643 | ||
CITATION_AUTHOR: 86936 | ||
FULLTEXT_FIGURE: 14254 | ||
FULLTEXT_TABLE: 9639 | ||
CITATION_EDITOR: 2165 | ||
FULLTEXT_OTHER: 158 | ||
HEADER_SUBMISSION: 1237 | ||
NAME-HEADER_OTHER: 17498 | ||
FIGURE_FIGDESC: 6889 | ||
NAME-HEADER_SUFFIX: 15 | ||
CITATION_VOLUME: 75843 | ||
NAME-CITATION_SUFFIX: 572 | ||
CITATION_LOCATION: 8027 | ||
NAME-HEADER_TITLE: 747 | ||
HEADER_WEB: 321 | ||
CITATION_INSTITUTION: 2104 | ||
HEADER_ABSTRACT: 2520 | ||
HEADER_REFERENCE: 2538 | ||
CITATION_PAGES: 80191 | ||
HEADER_AUTHOR: 4011 | ||
NAME-HEADER_MARKER: 8335 | ||
NAME-CITATION_FORENAME: 314984 | ||
CITATION_PUBLISHER: 5382 | ||
HEADER_PUBNUM: 1656 | ||
NAME-CITATION_MIDDLENAME: 68680 | ||
CITATION_PUBNUM: 10424 | ||
HEADER_COPYRIGHT: 1886 | ||
FULLTEXT_PARAGRAPH: 380335 | ||
FIGURE_FIGURE_HEAD: 9835 | ||
==================================================================================== | ||
|
||
************************************************************************************ | ||
COUNTER: FigureCounters | ||
************************************************************************************ | ||
------------------------------------------------------------------------------------ | ||
STANDALONE_FIGURES: 367 | ||
ASSIGNED_GRAPHICS_TO_FIGURES: 3973 | ||
==================================================================================== | ||
==================================================================================== | ||
|
Oops, something went wrong.