Skip to content
rzanoli edited this page Sep 10, 2015 · 119 revisions

The Quick Start guide is intended to allow Excitement Open Platform(EOP) users to start working right now. The only constraints is that the steps proposed in the guide have to be completed in the same order as they are reported whereas EOP hardware and software requirements to use EOP via Command Line Interface are meet.

Contents:

  1. Downloading and Installing EOP
  2. Creating new models
  3. Annotating text/hypothesis pairs
  4. Evaluating the results

Step 1. Downloading and Installing EOP

Installation

Goal: downloading and installing the EOP code and resources (e.g. lexical resources like WordNet and the platform configuration files).

Download and Install the EOP Java distribution as described in page Installation.

Step 2. Creating new models

Training

Goal: using Edit Distance EDA(Entailment Decision Algorithm) to create a new model on the English RTE-3 training data set to be used to annotate new text/hypothesis(T/H) pairs.

  1. Take a look at the training data set format:
> cat ~/Excitement-Open-Platform-{_version_}/target/EOP-{_version_}/eop-resources-{_version_}/data-set/English_dev.xml

Here is an example:

<?xml version="1.0" encoding="UTF-8"?>
<entailment-corpus lang="EN">
       <pair id="8" entailment="NONENTAILMENT" task="IE" >
               <t>Mrs. Bush's approval ratings have remained very high, above 80%, even as her husband's have recently dropped below 50%.</t>
               <h>80% approve of Mr. Bush.</h>
       </pair>
       -------------------------------------
       <pair id="800" entailment="ENTAILMENT" task="SUM" >
               <t>US Steel could even have a technical advantage over Nucor since one new method of steel making it is considering, thin strip casting, may produce higher quality steel than the Nucor thin slab technique.</t>
               <h>US Steel may invest in strip casting.</h>
       </pair>
</entailment-corpus>
  1. Go into the EOP-{version} directory:
> cd  ~/Excitement-Open-Platform-{_version_}/target/EOP-{_version_}/
  1. Train the EDA:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
    -config ./eop-resources-{_version_}/configuration-files/EditDistanceEDA_EN.xml
    -train 
    -trainFile ./eop-resources-{_version_}/data-set/English_dev.xml

danger As default the pre-processed files are save into /tmp/EN/dev (see EditDistanceEDA_EN.xml); each time you train an EDA be sure that that directory is empty.

Step 3. Annotating text/hypothesis pairs

Annotating

Goal:

  • annotating the English RTE-3 data set containing multiple T/H pairs.
  • annotating a single T/H pair provided from the command line.

Annotating the English RTE-3 data set

  1. Annotate the data set:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
      -config ./eop-resources-{_version_}/configuration-files/EditDistanceEDA_EN.xml
      -test 
      -testFile ./eop-resources-{_version_}/data-set/English_test.xml
      -output ./eop-resources-{_version_}/results/

danger As default the pre-processed files are save into /tmp/EN/test (see EditDistanceEDA_EN.xml); each time you annotate be sure that that directory is empty.

  1. Take a look at the annotated data set:
> cat eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.txt

Here is an example:

747     NONENTAILMENT   NonEntailment   0.2258241758241779
....................
795     ENTAILMENT      Entailment      0.5741758241758221

The first and the second column report the T/H pairs ID and the annotation as reported in the gold standard. The third column contains the prediction made by the EDA whereas the last is the confidence level of the prediction (i.e. how much EDA is sure about its decision).

Annotating a single text/hypothesis pair:

  1. Annotate the T/H pair:
> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner 
      -config ./eop-resources-{_version_}/configuration-files/EditDistanceEDA_EN.xml
      -test 
      -text "Hubble is a telescope"
      -hypothesis "Hubble is an instrument"
      -output ./eop-resources-{_version_}/results/
  1. Take a look at the annotated pair:
> cat eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.xml

Step 4. Evaluating the results

Evaluating

Goal: evaluating the annotated T/H pairs in terms of accuracy, Precision, Recall, and F1 measure.

Evaluate the results:

> java -Djava.ext.dirs=../EOP-{_version_}/ eu.excitementproject.eop.util.runner.EOPRunner
    -score 
    -results ./eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.txt

Take a look at the file of the results:

> cat eop-resources-{_version_}/results/EditDistanceEDA_EN.xml_results.txt_report.xml

Here is an example:

<?xml version="1.0" encoding="UTF-8"?>
<results>
  <label id="NONENTAILMENT">
    <Accuracy>62.750</Accuracy>
    <Precision>65.033</Precision>
    <Recall>51.026</Recall>
    <F1>57.184</F1>
    <ContingencyTable FN="191" FP="107" TN="303" TP="199"/>
  </label>
  <label id="ENTAILMENT">
    <Accuracy>62.750</Accuracy>
    <Precision>61.336</Precision>
    <Recall>73.902</Recall>
    <F1>67.035</F1>
    <ContingencyTable FN="107" FP="191" TN="199" TP="303"/>
  </label>
  <Accuracy>62.750</Accuracy>
  <Precision>62.750</Precision>
  <Recall>62.750</Recall>
  <F1micro>62.750</F1micro>
  <ContingencyTable FN="298" FP="298" TN="502" TP="502"/>
</results>

Precision, Recall, F1micro and Accuracy (i.e. Rand Accuracy) are defined here: https://en.wikipedia.org/wiki/Precision_and_recall

Quick Start is finished!

Complete

Quick Start is finished; now you could try to travel through again the steps with a different EDA like TIE EDA (to do that it is sufficient to follow the steps from 2 to 4 substituting the EditDistanceEDA_EN.xml configuration file with MaxEntClassificationEDA_Base+OpenNLP_EN.xml also available with EOP) or move to the Step by Step Tutorial that will teach you how to exploit the full potential of EOP via its Application Program Interface (API).

Clone this wiki locally