Skip to content
This repository has been archived by the owner on Dec 24, 2024. It is now read-only.

Unsupervised statistical root extraction without a dictionary #25

Open
aliok opened this issue Dec 10, 2012 · 0 comments
Open

Unsupervised statistical root extraction without a dictionary #25

aliok opened this issue Dec 10, 2012 · 0 comments
Labels

Comments

@aliok
Copy link
Owner

aliok commented Dec 10, 2012

Brute force root extractors already exist. However, the results are too much and it is better to do it statistically

This might be useful for finding roots that doesn't exist in the dictionary (e.g. local words) and proper nouns.

  • Save the possible roots for a big corpus (10M words) in a file
  • ...

For proper noun recognition

  • check if the root has been used with a apostrophe in the corpus
  • or check if the word starts with upper case in the middle of a sentence in the corpus

For e.g. verb recognition: for non-dictionary word 'kıvışlıyordu' find the root as 'kıvışlamak'

  • Check if there are other surfaces with root candidates as "kıvışlamak", such as 'kıvışladım' 'kıvışla' 'kıvışlarsa'
  • Then we would eliminate the some of the candidates : 'kıvışlımak' 'kıvışlıyormak' 'kıvışlıyomak' etc.
  • However, it doesn't eliminate the roots such as "kıvış+Noun" 'kıvmak' 'kıvımak' etc.
  • For them, check if there is other surfaces such as 'kıvışımı' 'kıvdım' 'kıvıyorum' etc.

That would help a lot.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant