This repository hosts scripts for formatting data and running baseline models for evidence extraction tasks.
- evidence-inference - Evidence identification/classification data.
- bert - A trained BERT model of your choice. BioBERT is linked here.
- EBM-NLP - NER data for PICO elements.
To run a given task, you must:
- Process the relevant source data to generate intermediate representations
- Generate input for the task of choice
- Train the appropriate model
For example, learn to identify evidence-bearing sentences, you would need to run the following:
cd scripts/
python process_evidence_inference
python generate_sentence_classifier_input
cd ../models/sentence_classifier/
./train.sh
This family of tasks use the original BERT method for sentence classification (predict based on the [CLS] token).
Given a sentence, predict if it contains a conclusion about an ICO frame. Positive examples are all evidence spans from evidence-inference, negative examples are random sentences.
Given an evidence-bearing sentence, predict what the conclusion is (increased, decreased, no sig difference). This is equivalent to the NAACL oracle task presented in the corpus paper.
TODO: add processing for EBM-NLP, provide sequence for training the NER tagger.