Skip to content

Latest commit

 

History

History
23 lines (20 loc) · 1.86 KB

readme.md

File metadata and controls

23 lines (20 loc) · 1.86 KB

Question Generation and Question Answering Data Sets

The following is an inventory of data sets around the Natural Language Processing (NLP) domains of Natural Language Generation (NLG)/ Question Generation (QG) and Natural Language Understanding (NLU)/ Question Answering (QA). The motivation to include QA into this repository is simply that often the two occur together. If a corpus is mentioned with a dash ('-') then it is not strictly a QG/NLG or QA/NLU corpus but has been mentioned in a related publication.

Data Sets

Type Name Link
QA SQuAD2.0 - The Stanford Question Answering Dataset https://rajpurkar.github.io/SQuAD-explorer/
QA Question-Answer Dataset http://www.cs.cmu.edu/~ark/QA-data/
QA A Corpus for Complex Question Answering over Knowledge Graphs http://sda.cs.uni-bonn.de/projects/qa-dataset/
QA WebQuestions https://nlp.stanford.edu/software/sempre/
QG Question Generation Shared Task & Evaluation Challenge (QGSTEC) 2010 - Generating Questions from Sentences https://github.com/bjwyse/QGSTEC2010
QA RecipeQA - A Dataset for Multimodal Comprehension of Cooking Recipes https://hucvl.github.io/recipeqa/
Cornell Movie--Dialogs Corpus https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
Ubuntu Dialogue Corpus v2.0 https://github.com/rkadlec/ubuntu-ranking-dataset-creator
OSU Twitter NLP Tools https://github.com/aritter/twitter_nlp
NLG WeatherGov https://cs.stanford.edu/~pliang/data/weather-data.zip
NLG Boxscore-Data https://github.com/harvardnlp/boxscore-data
NLG WebNLG 2017 Challenge Data http://webnlg.loria.fr/pages/challenge.html
NLG Wikipedia-biography-dataset https://github.com/DavidGrangier/wikipedia-biography-dataset
NLG RNNLG https://github.com/shawnwun/RNNLG
NLG ACL-Overview https://aclweb.org/aclwiki/Data_sets_for_NLG