Word-Level Neural LMs for Sentence Completion

A pytorch implementation of the assessment of word-level neural LMs for sentence completion. This repository is built upon Link.

Requirements

numpy
pandas
tqdm
pytorch == 1.1.0
pytorch-transformers == 1.0.0
sentencepiece (for tokenization of bert models)
nltk == 3.3 (download punkt package for tokenization when experimenting WordRNNs)

Datasets

Microsoft Research Sentence Completion Challenge
- Training and Test dataset can be downloaded from Link. Store the downloaded test data in data/completion/.
Scholastic Aptitude Test sentence completion questions
- Collected questions are provided in link. Store the downloaded test data in data/completion/.
TOPIK cloze questions
- 10 samples are contained in data/completion/topik_sample.csv
- Metadata for all questions are provided in data/completion/topik_sample.csv
- You may request the full set via e-mail
Nineteenth century novels (19C novels)
- A preprocessed dataset can be downloded from link.
Sejong corpus can be downloaded through link

Setup

Pre-trained LM1B can be downloaded from Link
Pre-trained transformers of pytorch-transformers
- automatically downloaded when running eval_pretrained.py with corresponding options

create ./settings.json containing

{
  "prob_set_dir": "data/completion/",
  "prepro_dir": "path_to_prepro_dir",
  "lm1b_dir": "path_to_dir_containing_lm1b_model",
  "pretrans_dir": "path_to_dir_containing_pytorch_transformers",
  "sejong_dir": "path_to_dir_containing_sejong_corpus"
}

Run

Training of WordRNN

python3 train.py --save_dir mynet

Evaluation

WordRNN

python3 eval_trained.py --dir mynet

Fine-tuning a Transformer-based model

python3 finetune.py --model one_of_('bert', 'gpt', 'gpt2') --pretrained saved_name --update-embeddings

Acknowledgment

Thanks to Sukhyun Cho who manually collected and annotated the TOPIK questions

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/completion		data/completion
README.md		README.md
data_utils.py		data_utils.py
eval_pretrained.py		eval_pretrained.py
finetune.py		finetune.py
lm1b.py		lm1b.py
model.py		model.py
prepro.py		prepro.py
tokenization.py		tokenization.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word-Level Neural LMs for Sentence Completion

Requirements

Datasets

Setup

Run

Training of WordRNN

Evaluation

Fine-tuning a Transformer-based model

Acknowledgment

About

Releases

Packages

Languages

heevery/sentence-completion

Folders and files

Latest commit

History

Repository files navigation

Word-Level Neural LMs for Sentence Completion

Requirements

Datasets

Setup

Run

Training of WordRNN

Evaluation

Fine-tuning a Transformer-based model

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages