This repo bundles experimental code for the EventDNA LRE paper.
The code has been verified to work only on Ubuntu.
For information on obtaining the corpus, see https://github.com/NewsDNA-LT3/.github.
The dataset should be extracted to ./extracted
. This should result in a dir with one dir per EventDNA document, such that each document dir has the structure:
eventdna_1/
dnaf.json
lets.csv
alpino/
1.xml
2.xml
3.xml
4.xml
dnaf.json
is annotated data in a custom format. lets.csv
contains syntactic information derived from the LETS analysis tool. The alpino
folder contains parse trees for each sentence in the document, produced by the Alpino parser for dutch.
main.py
is a CLI file. The command has two main options: train
and eval
.
python main.py train
will train CRF models in crossvalidation fashion. The resulting models as well as the dev datasets are saved to a timestamped dir under ./output
. Training parameters can be changed on the CLI, and there is a --test
option to run a limited training routine quickly. Use --help
to check the options.
To evaluate the resulting models, use python main.py eval <timestamped_output_dir>
. A directory called eval
will be created in the same dir, containing PRF evaluations per fold and averaged.