Skip to content

GermanLexicalResources

brittazeller edited this page Jun 14, 2013 · 6 revisions

This page keeps information about German Lexical Resource modules (that implements LexicalResource interface) within EXCITEMENT open platform.

@TODO (update the document, Britta & Gil)

Table of Contents

List of German Lexical Resources

For the moment (release 1.0) there are three lexical resources within CORE of EOP. They are: - DEWakDistributional: A resource based on distributional similarities observed on DEWak corpus. The resource holds 10k most frequent terms and their inter-similarity, and returns lexical rules based on those similarities. - DerivBase: The resource holds various forms (inter-POSes) of related derivational words, and returns lexical rules based on this resource. - GermaNetWrapper: This is an implementation that interacts with GermaNet to generate lexical rules. Note that GermaNet itself is not provided, and the user has to install it to use.

DEWakDistributional (core.component.lexicalknowledge.dewakdistributional.GermanDistSim)

Introduction

This resource implements a German lexical resource based on corpus term distribution. It uses the distance vectors which have been gathered from DeWac, a web corpus for German. The vectors are based on the 10k most frequent words observed in the corpus. Similarity is calculated with five different similarity measures (balAPinc, lin, linOpt, jaccard, dice). Only pairs which achieve a predefined minimum similarity are stored in the resource (for balAPinc: .7, for lin: .6, for linOpt: .6, for jaccard: .8, for dice: .9). As a confidence score, the resource returns the distributional similarity score which has been calculated for the lemma-POS pairs. Thus, depending on the measure used, it lies between .6 and 1.0. The DEWakDistributional is a simple LexicalResource and does not support LexicalResourceWithRelation.

Configurable values

No values to configure.

DerivBase (core.component.lexicalknowledge.derivbase.DerivBaseResource)

Introduction

@TODO [Some]

Configurable values

@TODO [List]

GermaNetWrapper (core.component.lexicalknowledge.germanet.GermaNetWrapper)

Introduction

This class implements a German Lexical Resource based on GermaNet 7.0, which is the German WordNet. The implementation accesses GermaNet via GermaNet API. The implementation supports both LexicalResource and LexicalResourceWithRelation. For the relations, it supports both OwnRelationSpecifier (with GermaNetRelation; possible relation types: synonym, hypernym, hyponym, antonym, causes, entails) and CanonicalRelationSpecifier (possible relation types: TERuleRelation.Entailment or .Nonentailment). For each OwnRelation, a confidence score can be set. They can be set in the configuration. If a configuration is used, but the scores are not defined, the confidences for all relations are all set to 0.0 by default. If no configuration is used, the confidences for all relations are all set to 1.0 by default.

Note: The EXCITEMENT project cannot and do not redistribute GermaNet, and the user of this component must get it with a proper license agreement from Tuebingen University. The GermaNet API, however, is provided with the project.

Configurable values

The GermaNet resource has a few configurable values. Basically, it needs path to GermaNet data itself, and a set of double values that indicates "confidence" for each own relation when they are treated as "entailment".

Section Property Value Requirement
GermaNetWrapper germaNetFilesPath Path to the GermaNet resource, which has to be installed by the user on his own computer. N/A
GermaNetWrapper causesConfidence Indicates a confidence score on how reliable the GermaNet 'causes' relation is considered. Value between 0 and 1. Causes are only used for rules LHS - RHS. N/A
GermaNetWrapper entailsConfidence Indicates a confidence score on how reliable the GermaNet 'entails' relation is considered. Value between 0 and 1. Entails are only used for rules LHS - RHS. N/A
GermaNetWrapper hypernymConfidence Indicates a confidence score on how reliable the GermaNet 'hypernym' relation is considered. Value between 0 and 1. Hypernyms are only used for rules LHS - RHS. N/A
GermaNetWrapper synonymConfidence Indicates a confidence score on how reliable the GermaNet 'snonym' relation is considered. Value between 0 and 1. Synonyms are used for both rules RHS - LHS and LHS - RHS. N/A
GermaNetWrapper hypoymConfidence Indicates a confidence score on how reliable the GermaNet 'hyponym' relation is considered. Value between 0 and 1. Hyponyms are only used for rules RHS - LHS. N/A
GermaNetWrapper antonymConfidence Indicates a confidence score on how reliable the GermaNet 'antonym' relation is considered. Value between 0 and 1. Note: This relation is deprecated, do not use it. N/A