This repository maintains C3, the first free-form multiple-Choice Chinese machine reading Comprehension dataset.
@article{sun2019investigating,
title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
journal={Transactions of the Association for Computational Linguistics},
year={2020},
url={https://arxiv.org/abs/1904.09679v3}
}
Files in this repository:
license.txt
: the license of C3.data/c3-{m,d}-{train,dev,test}.json
: the dataset files, where m and d represent "mixed-genre" and "dialogue", respectively. The data format is as follows.
[
[
[
document 1
],
[
{
"question": document 1 / question 1,
"choice": [
document 1 / question 1 / answer option 1,
document 1 / question 1 / answer option 2,
...
],
"answer": document 1 / question 1 / correct answer option
},
{
"question": document 1 / question 2,
"choice": [
document 1 / question 2 / answer option 1,
document 1 / question 2 / answer option 2,
...
],
"answer": document 1 / question 2 / correct answer option
},
...
],
document 1 / id
],
[
[
document 2
],
[
{
"question": document 2 / question 1,
"choice": [
document 2 / question 1 / answer option 1,
document 2 / question 1 / answer option 2,
...
],
"answer": document 2 / question 1 / correct answer option
},
{
"question": document 2 / question 2,
"choice": [
document 2 / question 2 / answer option 1,
document 2 / question 2 / answer option 2,
...
],
"answer": document 2 / question 2 / correct answer option
},
...
],
document 2 / id
],
...
]
annotation/c3-{m,d}-{dev,test}.txt
: question type annotations. Each file contains 150 annotated instances. We adopt the following abbreviations:
Abbreviation | Question Type | |
---|---|---|
Matching | m | Matching |
Prior knowledge | l | Linguistic |
s | Domain-specific | |
c-a | Arithmetic | |
c-o | Connotation | |
c-e | Cause-effect | |
c-i | Implication | |
c-p | Part-whole | |
c-d | Precondition | |
c-h | Scenario | |
c-n | Other | |
Supporting Sentences | 0 | Single Sentence |
1 | Multiple sentences | |
2 | Independent |