-
Notifications
You must be signed in to change notification settings - Fork 74
Quick Start
The Quick Start guide is intended to allow Excitement Open Platform(EOP) users to start working right now. The only constraints is that the steps proposed in the guide have to be completed in the same order as they are reported whereas EOP hardware and software requirements to use EOP via Command Line Interface are meet.
- Downloading and Installing EOP
- Creating new models
- Annotating text/hypothesis pairs
- Evaluating the results
Goal: downloading and installing the EOP code and resources (e.g. lexical resources like WordNet and the platform configuration files).
Download and Install the EOP Java distribution as described in page Installation.
Goal: using Edit Distance EDA(Entailment Decision Algorithm) to create a new model on the English RTE-3 training data set to be used to annotate new text/hypothesis(T/H) pairs.
- Take a look at the training data set format:
> cat ~/Excitement-Open-Platform-{_version_}/target/EOP-{_version_}/eop-resources-{_version_}/data-set/English_dev.xml
Here is an example:
<?xml version="1.0" encoding="UTF-8"?>
<entailment-corpus lang="EN">
<pair id="8" entailment="NONENTAILMENT" task="IE" >
<t>Mrs. Bush's approval ratings have remained very high, above 80%, even as her husband's have recently dropped below 50%.</t>
<h>80% approve of Mr. Bush.</h>
</pair>
-------------------------------------
<pair id="800" entailment="ENTAILMENT" task="SUM" >
<t>US Steel could even have a technical advantage over Nucor since one new method of steel making it is considering, thin strip casting, may produce higher quality steel than the Nucor thin slab technique.</t>
<h>US Steel may invest in strip casting.</h>
</pair>
</entailment-corpus>
- Go into the EOP-{version} directory:
> cd ~/Excitement-Open-Platform-{_version_}/target/EOP-{_version_}/
- Train the EDA:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
-config ./eop-resources-{_version_}/configuration-files/EditDistanceEDA_EN.xml
-train
-trainFile ./eop-resources-{_version_}/data-set/English_dev.xml
As default the pre-processed files are save into /tmp/EN/dev (see EditDistanceEDA_EN.xml); each time you train an EDA be sure that that directory is empty.
Goal:
- annotating the English RTE-3 data set containing multiple T/H pairs.
- annotating a single T/H pair provided from the command line.
- Annotate the data set:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
-config ./eop-resources-{_version_}/configuration-files/EditDistanceEDA_EN.xml
-test
-testFile ./eop-resources-{_version_}/data-set/English_test.xml
-output ./eop-resources-{_version_}/results/
As default the pre-processed files are save into /tmp/EN/test (see EditDistanceEDA_EN.xml); each time you annotate be sure that that directory is empty.
- Take a look at the annotated data set:
> cat eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.txt
Here is an example:
747 NONENTAILMENT NonEntailment 0.2258241758241779
....................
795 ENTAILMENT Entailment 0.5741758241758221
The first and the second column report the T/H pairs ID and the annotation as reported in the gold standard. The third column contains the prediction made by the EDA whereas the last is the confidence level of the prediction (i.e. how much EDA is sure about its decision).
- Annotate the T/H pair:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
-config ./eop-resources-{_version_}/configuration-files/EditDistanceEDA_EN.xml
-test
-text "Hubble is a telescope"
-hypothesis "Hubble is an instrument"
-output ./eop-resources-{_version_}/results/
- Take a look at the annotated pair:
> cat eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.xml
Goal: evaluating the annotated T/H pairs in terms of accuracy, Precision, Recall, and F1 measure.
Evaluate the results:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
-score
-results ./eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.txt
Take a look at the file of the results:
> cat eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.txt_report.xml
Here is an example:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<label id="NONENTAILMENT">
<Accuracy>62.750</Accuracy>
<Precision>65.033</Precision>
<Recall>51.026</Recall>
<F1>57.184</F1>
<ContingencyTable FN="191" FP="107" TN="303" TP="199"/>
</label>
<label id="ENTAILMENT">
<Accuracy>62.750</Accuracy>
<Precision>61.336</Precision>
<Recall>73.902</Recall>
<F1>67.035</F1>
<ContingencyTable FN="107" FP="191" TN="199" TP="303"/>
</label>
<Accuracy>62.750</Accuracy>
<Precision>62.750</Precision>
<Recall>62.750</Recall>
<F1micro>62.750</F1micro>
<ContingencyTable FN="298" FP="298" TN="502" TP="502"/>
</results>
Precision, Recall, F1micro and Accuracy (i.e. Rand Accuracy) are defined here: https://en.wikipedia.org/wiki/Precision_and_recall
Quick Start is finished; now you could try to travel through again the steps with a different EDA like TIE EDA (to do that it is sufficient to follow the steps from 2 to 4 substituting the EditDistanceEDA_EN.xml configuration file with MaxEntClassificationEDA_Base+OpenNLP_EN.xml also available with EOP) or move to the Step by Step Tutorial that will teach you how to exploit the full potential of EOP via its Application Program Interface (API).