Evaluation based on NER #3

alexanderpanchenko · 2015-11-30T10:30:41Z

Motivation

A preliminary evaluation of quality of clustering based on named entity recognition task. Deadline -- 16 of december.

Implementation

Select manually from the results clusters that correspond to
- names
- surnames
- cities
- countries
- names of programming languages and technologies e.g. "javascrpt"
- companies
- fruits and vegetables
To select clusters look for keywords that are unambigous e.g. Pepsi or Javascript or Robert.
Create an ElasticSearch index with all these clusters. Add as an attribute corresponding category. Each category can have several attributes.
Download the texts here (the xml files reuters.xml and 500news.xml) https://github.com/AKSW/n3-collection
Parse the xml files to get the plain text.

For each word in the text retrieve clusters from ElasticSearch it belongs to. Assign category to the word. Example of the output format:

Darmstadt  CITY  NamedEntityInText
is 
a 
nice 
city.  CITY NamedEntityInText
John  NAME  NamedEntityInText
Smith  SURNAME NamedEntityInText
is 
a 
well-known
layer.

For each occurrence of the tag in the text, manually count precision as the number of correct tags vs the number of all tags.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation based on NER #3

Evaluation based on NER #3

alexanderpanchenko commented Nov 30, 2015

Evaluation based on NER #3

Evaluation based on NER #3

Comments

alexanderpanchenko commented Nov 30, 2015

Motivation

Implementation