An investigation of complex word identification (CWI) systems for English.

Vrije Universiteit Amsterdam Computational Lexicology and Terminology Lab Department of Language and Communication Faculty of Humanities

To run the feature extraction notebooks in the CAMB, CAMB_A and Final_system folders, you will need to download Stanford CoreNLP here and then navigate to the stanford-corenlp-4.5.4 folder and start core with “% java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

Data

The dataset used to train these models was collected by Yimam et al. (2018) and is available here.

Thesis Report

This repository consists of a series of notebooks investigating feature-based aproaches for complex word identification in English.

Available here (https://www.overleaf.com/read/wmvwtmpbkvqs)

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Baseline		Baseline
CAMB		CAMB
Camb_A		Camb_A
Final_system		Final_system
Models		Models
pickled-dataframes		pickled-dataframes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An investigation of complex word identification (CWI) systems for English.

Data

Thesis Report

About

Releases

Packages

Languages

License

cltl-students/Adam_Tucker_Complex_Word_Identification

Folders and files

Latest commit

History

Repository files navigation

An investigation of complex word identification (CWI) systems for English.

Data

Thesis Report

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages