Skip to content

Latest commit

 

History

History
50 lines (30 loc) · 6.76 KB

conclusions_future_work.md

File metadata and controls

50 lines (30 loc) · 6.76 KB

Conclusions, Results, and Future Work

Ultimately, this project is a successful demonstration of an end-to-end gap-fill question generation system. Interestingly, it does not directly rely upon supervised machine learning. Rather, it uses many different unsupervised learning techniques and some well thought out, intuitive herusitics. Advantageously, at a bare minimum, the system only relies upon a fact-filled corpus within the desired domain. As our experiments show, however, it is possible to include more knowledge in the system via additional corpora.

The system used in the experiments had information from a vareity of sources. For example, the word vectors used in the experiments were built from a corpus that was completely different from the one used in question generation. While we did observe a qualatative improvement in the selected gap-words and distractors when we eliminated out-of-vocabulary word vectors, our overall impression is that this corpus difference does not greatly impact the system's effectiveness. Further experiments would be able to drill-down into this potential line of fruitful transfer learning in the system.

Results

For experimentation, we used the text of Campbel's Biology, 9th edition. Every sentence of this book is located in the biology.txt file.

The learned BTM information is here. This directory contains the unique words in the corpus as well as all of the topic model parameters (located in the subdirectory model/).

The selected sentences we generated in our experiments are here. The format of this file is tab separated. Each line represents a different scored sentence. In order, the fields are:

  1. The first field is the global sentence index.
  2. The second field is the score produced by the sentence scoring algorithm.
  3. The third field is the top topics that are represented in the sentence. These topic indicies are space-separated.
  4. The foruth and final field is the original, word-for-word text of the sentence.

Our experimental results, i.e. generated questions with selected gap word and distractors from this biology corpus, are here:

The format of these files is tab separated. Each line represents a different generated question. In order, the fields are:

  1. The first field is the global sentence index.
  2. The second field is the selected gap word.
  3. The third field is a space separated start and end index of the gap word within the original sentence.
  4. The fourth and final field is the generated distractors, separated by spaces.

Note that all indices in this project (sentence, character, topic, etc.) are zero-indexed.

Future Work

We would like to extend this work in a few key areas. In no particular order:

  • Implementing DATM We could experiment with the Deep Autoencoder Topic Model (DATM) described in the RevUp paper. Importantly, it would be interesting to see what, if any, differences in question generation there is between BTM and DATM.

  • Supervised Gap Filling We could use the Microsoft Research mind the gap and the QGSTEC2010 data resources to train supervised gap-selection models. This data mirrors the procedure used by the RevUp authors. Additionally, we could use this data as a basis of more rigorous, quantative evaluation of our topic-weighted word vector method for gap-word selection.

  • Simultaneous, Coupled Learning Instead of treating sentence scoring, gap-phrase selection, and distractor generation as separate, independent tasks, we could combine them into a single, learnable multi-part task. Combining supervised signals for which words make good gaps in particular, specific sentences would probably achieve a better overall question generation performance. Likewise, if we learn what makes a good gap phrase at the same time as we learn what makes good distractors, we would likely observe higher quality questions.

  • More Relevant Distractors Currently, our system does a less-than-optimal job at ensuring the generated distractors are relevant and fit-well with the chosen gap-word and selected sentence. A low hanging fruit would be to ensure that the matching distractor and gap-word morphology. In an expanded sense, our qualatiative observations are that the distractors are often not relevant to the sentence's context. Ensuring that the distractors make sense is vital for overall question quality.

  • Corpus-Specific Word Vectors We could re-train word vectors on the biology corpus data only, instead of using the pre-trained word vectors from the opensource word2vec project. We might find that these word vectors are more appropriate for our unspervised technique for gap-word selection.

  • NLP Tool Improvement We could be more dilligent and thorough in our NLP pipeline. Specifically, we could use constituency parsing and named entity recognition (NER) in order to expand our system's limitation of gap word finding to gap phrase finding. Additionally, this would mean that we'd be able to find distractor phrases too. This ability would likely allow us to create more appropriate question, answer, and distractor triples from text.

  • Syntax in Sentence Scoring and Selection The core idea is to further experiment with different topic word-vector creation and weighting schemes in sentence scoring. Currently, we perform a simple summation of all associated word vectors in a given sentence where each vector is weighted by conditional word distributions of the sentence's top three latent topics. We do not even, for example, weight these probabilities by the original topic weights that we use for sentence scoring. On the more sophisticated end of the spectrum, we could use the graphs produced by dependency parsing to re-shape and re-prioritize how we perform topic weighting via word vectors. For instance, if a word or phrase is the root of the sentence, we could choose to give its associated word vector more weight. We could additionally add-in a weight decay that would be inversely proprotional to the length of the word's path to the root.