Skip to content

cltl-students/Adam_Tucker_Complex_Word_Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An investigation of complex word identification (CWI) systems for English.

Vrije Universiteit Amsterdam Computational Lexicology and Terminology Lab Department of Language and Communication Faculty of Humanities

To run the feature extraction notebooks in the CAMB, CAMB_A and Final_system folders, you will need to download Stanford CoreNLP here and then navigate to the stanford-corenlp-4.5.4 folder and start core with “% java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

Data

The dataset used to train these models was collected by Yimam et al. (2018) and is available here.

Thesis Report

This repository consists of a series of notebooks investigating feature-based aproaches for complex word identification in English.

Available here (https://www.overleaf.com/read/wmvwtmpbkvqs)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published