Add new rule to keep whitespace between ascii tokens

This resolves most of the weird spacing around ASCII input. Note that it has to happen at the end of the processing pipeline to work with rules that make sure ascii-ish punctuation next to Japanese gets the right results.
polm · Dec 20, 2024 · c368f5c · c368f5c
1 parent 519e374
commit c368f5c
Showing 1 changed file with 7 additions and 0 deletions.
diff --git a/cutlet/cutlet.py b/cutlet/cutlet.py
@@ -248,6 +248,13 @@ def romaji_tokens(self, words, capitalize=True, title=False):
                 out.append(tok)
                 continue
 
+            # preserve spaces between ascii tokens
+            if (word.surface.isascii() and
+                nw and nw.surface.isascii()):
+                use_space = bool(nw.white_space)
+                out.append(Token(word.surface, use_space))
+                continue
+
             out.append(tok)
 
             # no space sometimes