Skip to content

Fix Kana Unk Handling

Compare
Choose a tag to compare
@polm polm released this 26 Jul 09:44
· 75 commits to master since this release

This release fixes the issue (#8) where hiragana or katakana words not in the
dictionary would not be converted to romaji, but reproduced as-is. Now
they are romanized, though since they're not in the dictionary this will
often fail to capture original spelling.

A further consequence of this change is that unknown words in scripts
that aren't kana or ascii need to be handled. By default these
characters will be converted to "?" for maximum technical compatability,
though by setting the ensure_ascii property on a Cutlet to False you
can disable this behavior, which will cause unknown characters to pass
through.

Example:

import cutlet
cut = cutlet.Cutlet()
cutlet.romaji('彁')
# -> ?
cut.ensure_ascii = False
cut.romaji('彁')
# -> 彁

Note that besides unknown kanji this affects non-latin scripts like Cyrillic and Hangul.