Skip to content
This repository has been archived by the owner on Dec 24, 2024. It is now read-only.

Phrase recognition and database #28

Open
aliok opened this issue Dec 10, 2012 · 0 comments
Open

Phrase recognition and database #28

aliok opened this issue Dec 10, 2012 · 0 comments
Labels

Comments

@aliok
Copy link
Owner

aliok commented Dec 10, 2012

In order to use in tokenizer (sentence to words), we need something like that.
Can be done statistically with some rules, with the support of Issue #25

hafta sonu => hafta_sonu
Turkiye Cumhuriyeti ==> Turkiye_Cumhuriyeti
ilan etmek --> ilan_etmek

Doesn't make sense to parse "ilan" and "etmek" separately.

Zemberek has already a small database about these.

Issue #32 is related

See http://www.tdk.gov.tr/index.php?option=com_content&view=article&id=221:Ayri-Yazilan-Birlesik-Kelimeler&catid=50:yazm-kurallar&Itemid=132

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant