-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common ontology for part of speech #43
Comments
I'm sorry I've missed this note. It would be much more practical to use an existing taxonomy. Back when I created the tagUsage mechanism for aggregating grammatical information, ISOCat was probably the hype (not an ontology, just a messy set of potentially orderly taxonomic groupings). But ISOCat is gone now, replaced by a proprietary engine aiming at something slightly different than our goals. Another viable goal back then was the so-called GOLD ontology, created on the basis of a single comprehensive linguistic monograph, with (as far as I can recall, and this may be a false recollection) additions from various indigenous languages, coming from field workers. GOLD is not very alive nowadays, i'm afraid. Somewhere along the way was/is the OLiA ontology, whose main mover is still very alive and kicking, so this could be worth exploring. OR, something that has come to my mind right now and need not be the best solution for our goals, is the so-called universal tagset used by Universal Dependencies. The idealized picture would be to use each (non-universal) language-specific UD tagset and provide the (UD-supplied) mapping to the universal tagset.
Well, then... OLiA might be the only viable solution, currently. |
Maybe having a common ontology is overkill for our project. But a dictionary
should list the used PoS's in its header so that people know what "s" is,
because it might have been encoded as "n" somewhere else.
|
Some dictionaries already have some kind of local ontology to reliably identify
parrt of speech (and potentially gender, etc.). Examples are the WikDict
dictionaries or eng-pol. Most other dictionaries lack this information, there
the
<pos/>
tag may contain arbitrary text. For machine-friendlypostprocessing, this should be mapped to an ontology, valid for all FreeDict
dictionaries.
Things to happen:
The text was updated successfully, but these errors were encountered: