Fix Kana Unk Handling
This release fixes the issue (#8) where hiragana or katakana words not in the
dictionary would not be converted to romaji, but reproduced as-is. Now
they are romanized, though since they're not in the dictionary this will
often fail to capture original spelling.
A further consequence of this change is that unknown words in scripts
that aren't kana or ascii need to be handled. By default these
characters will be converted to "?" for maximum technical compatability,
though by setting the ensure_ascii
property on a Cutlet to False you
can disable this behavior, which will cause unknown characters to pass
through.
Example:
import cutlet
cut = cutlet.Cutlet()
cutlet.romaji('彁')
# -> ?
cut.ensure_ascii = False
cut.romaji('彁')
# -> 彁
Note that besides unknown kanji this affects non-latin scripts like Cyrillic and Hangul.