Skip to content
This repository has been archived by the owner on Dec 24, 2024. It is now read-only.

Duplication recognition in tokenization #32

Open
aliok opened this issue Dec 10, 2012 · 0 comments
Open

Duplication recognition in tokenization #32

aliok opened this issue Dec 10, 2012 · 0 comments
Labels

Comments

@aliok
Copy link
Owner

aliok commented Dec 10, 2012

Duplications:

  1. abur cubur both doesn't make sense
  2. yemek memek second doesn't make sense
  3. iyi kotu opposites
  4. zırıl zırıl called "sound reflection" in Turkish
  5. sıcak sıcak 2 adjectives turn into an adverb
  6. gide gide
  7. kırk elli kişi
  8. uc bes kurus
  9. bata cika
  10. enine boyuna
  11. ev bark
  12. bas basa, daldan dala, ucu ucuna
  13. gelir gelmez, yapar yapmaz --> Adverbs

Some of them can be done during tokenization.

Some of them needs to be done after parsing, such as 13

_R+Verb+Pos+Aor+A3sg _R+Verb+Neg+Aor+A3sg %1 %2+Adverb+When

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant