MEDIQA-CORR-2024

Website: https://sites.google.com/view/mediqa2024/mediqa-corr
MEDIQA-CORR@Codabench: https://www.codabench.org/competitions/1900

Task Description

This task addresses the problem of identifying and correcting (common sense) medical errors in clinical notes. From a human perspective, these errors require medical expertise and knowledge to be both identified and corrected.

Each clinical text is either correct or contains one error. The task consists in: (a) predicting the error flag (1: the text contains an error, 0: the text has no errors), and for flagged texts (with error): (b) extracting the sentence that contains the error, and (c) generating a corrected sentence.

Data Description

The MS Training Set contains 2,189 clinical texts.
The MS Validation Set (#1) contains 574 clinical texts.
The UW Validation Set (#2) contains 160 clinical texts.
The test will include clinical texts from the MS and UW collections.
The test set will contain the following 3 items: Text ID | Text | Sentences

Run Submission

The submission format should follow the data format and consists of:

[Text ID] [Error Flag] [Error sentence ID or -1 for texts without errors] [Corrected sentence or NA for texts without errors]

E.g.:

text-id-1 0 -1 NA
text-id-2 1 8 "correction of sentence 8..."
text-id-3 1 6 "correction of sentence 6..."
text-id-4 0 -1 NA
text-id-5 1 15 "correction of sentence 15..."

Evaluation Metrics & Scripts

We use the following evaluation metrics:

Accuracy for Error Flag Prediction (subtask A) and Error Sentence Detection (subtask B)
NLG metrics: ROUGE, BERTScore, BLEURT, their Aggregate-Score (Mean of ROUGE-1-F, BERTScore, BLEURT-20), and their Composite Scores for the evaluation of Sentence Correction (subtask C).
- The Composite score is the mean of individual scores computed as follows for each text:
  - 1 point if both the system correction and the reference correction are "NA"
  - 0 point if only one of the system or the reference is "NA"
  - NLG metrics value in [0, 1] range (e.g., ROUGE, BERTScore, BLEURT or Aggregate-Score) if both the system correction and reference correction are non-"NA" sentences.
- Aggregate score is the main evaluation score to rank the participating systems.

Evaluation scripts: https://github.com/abachaa/MEDIQA-CORR-2024/tree/main/evaluation

The first evaluation script computes Accuracy, ROUGE, and ROUGE-Composite scores:
- Error Flag Accuracy (subtask A)
- Error Sentence Detection Accuracy (subtask B)
- NLG Metrics (subtask C): ROUGE-1-F, ROUGE-2-F, ROUGE-L-F, ROUGE-1-F-Composite, ROUGE-2-F-Composite, ROUGE-L-F-Composite.
- Composite score computation: ROUGE on sentence vs. sentence cases + ones or zeros when either the reference or the candidate correction is NA.

2. The second evaluation script computes Accuracy, ROUGE/BERTScore/BLEURT, Aggregate-Score and their Composite scores:

Error Flag Accuracy (subtask A)
Error Sentence Detection Accuracy (subtask B)
NLG Metrics (subtask C):
- ROUGE-1-F, ROUGE-2-F, ROUGE-L-F, BERTScore (microsoft/deberta-xlarge-mnli), BLEURT-20
- ROUGE-1-F-Composite, ROUGE-2-F-Composite, ROUGE-L-F-Composite, BERTScore-Composite, BLEURT-20-Composite
- Aggregate-Score (main score to rank the participating systems): Mean of ROUGE-1-F, BERTScore, BLEURT-20 (correlates better with human judgments; cf. our ACL Findings 2023 paper on evaluation metrics https://aclanthology.org/2023.findings-acl.161.pdf
- Aggregate-Composite

Contact

MEDIQA-NLP mailing list: https://groups.google.com/g/mediqa-nlp Email: [email protected]

Organizers

Asma Ben Abacha, Microsoft, USA
Wen-wai Yim, Microsoft, USA
Meliha Yetisgen, University of Washington, USA
Fei Xia, University of Washington, USA

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
evaluation		evaluation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MEDIQA-CORR-2024

Task Description

Data Description

Run Submission

Evaluation Metrics & Scripts

Contact

Organizers

About

Releases

Packages

Languages

natpatchara-w/MEDIQA-CORR-2024

Folders and files

Latest commit

History

Repository files navigation

MEDIQA-CORR-2024

Task Description

Data Description

Run Submission

Evaluation Metrics & Scripts

Contact

Organizers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages