Skip to content

heevery/sentence-completion

Repository files navigation

Word-Level Neural LMs for Sentence Completion

A pytorch implementation of the assessment of word-level neural LMs for sentence completion. This repository is built upon Link.

Requirements

  • numpy
  • pandas
  • tqdm
  • pytorch == 1.1.0
  • pytorch-transformers == 1.0.0
  • sentencepiece (for tokenization of bert models)
  • nltk == 3.3 (download punkt package for tokenization when experimenting WordRNNs)

Datasets

  • Microsoft Research Sentence Completion Challenge
    • Training and Test dataset can be downloaded from Link. Store the downloaded test data in data/completion/.
  • Scholastic Aptitude Test sentence completion questions
    • Collected questions are provided in link. Store the downloaded test data in data/completion/.
  • TOPIK cloze questions
    • 10 samples are contained in data/completion/topik_sample.csv
    • Metadata for all questions are provided in data/completion/topik_sample.csv
    • You may request the full set via e-mail
  • Nineteenth century novels (19C novels)
    • A preprocessed dataset can be downloded from link.
  • Sejong corpus can be downloaded through link

Setup

  • Pre-trained LM1B can be downloaded from Link
  • Pre-trained transformers of pytorch-transformers
    • automatically downloaded when running eval_pretrained.py with corresponding options

create ./settings.json containing

{
  "prob_set_dir": "data/completion/",
  "prepro_dir": "path_to_prepro_dir",
  "lm1b_dir": "path_to_dir_containing_lm1b_model",
  "pretrans_dir": "path_to_dir_containing_pytorch_transformers",
  "sejong_dir": "path_to_dir_containing_sejong_corpus"
}

Run

Training of WordRNN

python3 train.py --save_dir mynet

Evaluation

  • WordRNN

python3 eval_trained.py --dir mynet

Fine-tuning a Transformer-based model

python3 finetune.py --model one_of_('bert', 'gpt', 'gpt2') --pretrained saved_name --update-embeddings


Acknowledgment

Thanks to Sukhyun Cho who manually collected and annotated the TOPIK questions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages