Skip to content

Frequently asked questions

madhumita-dfki edited this page Dec 11, 2014 · 61 revisions

Using EOP as a library and let my code use it

A number of EOP functionalities are accessible both via the EOP Application Program Interface (API) and Command Line Interface (CLI). The Step by Step Tutorial teaches you how to do that by concrete examples.

Adding a new language model to MaltParser

Dependency trees in the EOP are produced by MaltPaser that is available for the 3 languages: English, German and Italian. In case you need to port MaltParser into a new language you have to produce a new language model. The following page explains how it can be done: [Add new language models to MaltParser](Add new language models to MaltParser)

401 Unauthorized using myProject's code examples

myProject-EOPv{version}.tar.gz contains some [examples of code](Hello World! code and additional examples) that you can use to become more confident with the EOP. It could happen that when you import it into Eclipse, Eclipse produces a 401 Unauthorized error. Generally it is due to one of the following:

  • TreeTagger has not been installed
  • The TreeTagger artifacts version reported in the pom.xml file of myProject is different from that installed with TreeTagger and that is in the user local maven repository (e.g., /home/user_name/.m2/).

The documentation [here](Step-by-Step, TreeTagger-Installation) reports the correct procedure to install TreeTagger and how let it work with the EOP.

ERROR EOPRunner: Could not perform training

EOPRunner is the Java class to be used when you want to use the EOP from the command line. When you call EOPRunner so that it pre-processes the data set and then train/test an EDA, EOPRunner saves the pre-processed files into a temporary directory (it is set in the configuration file of the used EDA); it could happen that if you do not remember to remove the pre-processed files of a previuous run, EOPRunner will use both the previous and current files for training/testing. As a consequence in the best case you will have unexpected results while in other cases, when for example those files have been annotated with a different pre-processing, EOPRunner could stop with an error: Could not perform training.

How can I use lexical resources of EOP?

Q: I've heard that, EOP has many lexical knowledge resources wrapped in common interface, and that's really great for all semantic inference researcher/developers. 1) How that interface looks like? 2) How can I use all those lexical resources in EOP?

A: The best way to learn how to access a lexical resource component of EOP would be checking example code. Please check the following source code, with guided explanation. Please consult the following document & example code.

Guide: https://docs.google.com/document/d/1y0XfWJMdzuOyBDxDePqjwbl_NlGPTG-i9N-6pZR-R1w/edit#heading=h.hqeq2z4idq26 Code: https://github.com/gilnoh/Excitement-Open-Platform/blob/fallschool/fallschool/src/main/java/eu/excitementproject/fallschool/Ex3.java

How can I use linguistic pipelines (LAP) of EOP? (such as POS tagging, lemmatize and parsing pipeline)

Q: I've heard that EOP standardized preprocessing (linguistic pipelines), such as POS tagging, parsing, NER, etc into common methods that is called as LAP (Linguistic Analysis Pipeline). How can use existing preprocessing pipelines of EOPs?

A: The best way to learn how to access LAP pipelines would be learning by some examples. Please check the following source code, with guided explanation. (Note that, the code example assumes that you have installed TreeTagger modules, by following the installation steps.)

Guide: https://docs.google.com/document/d/1eVaqN6FnQeEYwotw_TwZfKHbU3SBjhx2z4z82alF8EM/edit#heading=h.l6hgglfnjpz3 Code: https://github.com/gilnoh/Excitement-Open-Platform/blob/fallschool/fallschool/src/main/java/eu/excitementproject/fallschool/EX1.java

Extending TIE EDA with a new lexical resource

Q: I have a LexicalResource which I would like to add to the resources used in TIE. What do I need to do to integrate it?

A: Adding a LexicalResource to TIE is done with a few code changes. In the following, we assume the addition of a resource for English. Adding resources for German works similar; in this case, the changes need to be made to core/src/main/java/eu/excitementproject/eop/core/component/scoring/BagOfLexesPosScoringDE or core/src/main/java/eu/excitementproject/eop/core/component/scoring/BagOfLexesScoringDE (depending on whether one wants the resource to be able to employ part-of-speech information, or not).

Do the following changes in core/src/main/java/eu/excitementproject/eop/core/component/scoring/BagOfLexesScoringEN.java:

  • Add your LexicalResource as a class variable, e.g.:
private MyResource myRes;
  • Change the constructors:
    • If you plan to call your resource only via a configuration file, it is enough to change the second constructor (using parameter CommonConfig). But to avoid confusion, we recommend to adapt both constructors.
    • The first action in the constructors is to check whether any of the LexicalResources for English exists; you need to add your resource in this if-query.
    • Add code so that your resource is correctly initialised. As an example, consider how e.g. WordNet is initialised. But your initialisation can look much simpler, i.e. as simple as:
 // initialize MyResource
 if (null != comp.getString("MyResource")) {
   try {
     myRes = new MyResource();
     numOfFeats++; 
     // This 'try' part may need additional steps; for inspiration, look at existing initailisations.
   } catch (MyResourceNotInstalledException e) {
     logger.warning("WARNING: some sensible error message");
     throw new LexicalResourceException(e.getMessage());
   } catch (BaseException e) {
     throw new LexicalResourceException(e.getMessage());
   }
   logger.info("Load MyResource done.");
}
  • Write a method very similar to calculateSingleLexScoreWithWNRelations, which you name, e.g., calculateSingleLexScoreWithMyRelations. This is your central method, which gathers information from your LexicalResource:
    • The easiest is to copy-paste the code from the ...WithWNRelations method, and just use your resource as last input parameter instead of the WN resource
    • NOTE: If you plan to add more resources which are called exactly in the same way, you might want to write a method calculateSingleLexScore which handles any kind of LexicalResource. For example code, see the method calculateSingleLexScore in core/src/main/java/eu/excitementproject/eop/core/component/scoring/BagOfLexesScoringDE.java.
  • Modify the method calculateScores so that the new method you've just created is actually called:
if (checkInSomeWayThatYourResourceIsActivatedAndInitialised) { //TODO
  scoresVector.add(calculateSingleLexScoreWithVORelations(tBag, hBag, myRes));
}
  • Add your resource to the close() method:
if (null != myRes) {
  myRes.close();
}

For examples how the TIE EDA with resources can be called, consider the JUnit test class core/src/test/java/eu/excitementproject/eop/core/component/scoring/BagOfWordsScoringTest.java .

Extending TIE EDA with a new Scoring component

A new scorer can be added to TIE by adding a class for your scorer. The steps here describe the procedure to do the same. Within the package eu.excitementproject.eop.core.component.scoring in core/src/main/java , add a class for your scorer. There can be two scenarios:

  1. Your scorer extends the existing scorers, for example BagOfWordsScoring or BagOfLemmasScoring.
  2. Your scorer is an independent scorer which implements the ScoringComponent , with the path

    eu.excitementproject.eop.common.component.scoring.ScoringComponent .

In the first case, do the following:

  1. Create a class YourScoring extends ParentScoring
  2. Within the members of the class, add a logger using: static Logger logger = Logger.getLogger(YourScoring.class.getName());
  3. Add a variable numOfFeats which is equal to the additional no. of features that your scorer adds. Assuming it to be x,

    private int numOfFeats = x;

  4. Override all the relevant methods in the ParentScoring to your Scoring specific details. For example, getNumOfFeats() method in the parent class should be updated to return the new numOfFeats.

    Similarly, any other function with details like ComponentName will be updated.

  5. Within the constructor, initialize any scorer specific details that you may have.
  6. Override the function

    public Vector calculateScores(JCas aCas) .

    This function takes as input parameter a JCas object corresponding to T/H pair and returns a vector of the scores. The scoresVector constitutes of n values, where n = numOfFeats, such that we have 1 score for each feature we wish to use for training.

For the second scenario, when your scorer does not extend any existing scorers, do the following:

  1. Create a class YourScoring implements ScoringComponent.
  2. Specify the no. of features for your scorer.
  3. Within the constructor, initialize various scorer specific details.
  4. Add function definitions for the functions that are defined in the interface and need to be implemented. Follow steps 2, 4 and 6 from the previous case.

The scorer can be tested by adding a test class within the package eu.excitementproject.core.component.scoring in the path core/src/test/java/ .

These steps makes a scorer compatible with the EDA. Now to be able to use this scorer, changes need to be made in the EDA to use the scorer.

Within the package in eu.excitementproject.eop.core in the path core/src/main/java/, TIE is implemented as the class MaxEntClassification EDA. Within this class, in the function initializeComponents(CommonConfig config), add a section for the case when the configuration file contains your scorer. Within this section, add a call to the constructor of the scoring class implemented in the previous step. In the same class in the function void shutdown(), add a statement to close your ScoringComponent.

The scorer can be used in the EDA by adding a section for this scorer in the configuration file.

Clone this wiki locally