“[uv_break]”没有处理 #240

haijd · 2024-09-08T15:50:27Z

使用以下文本时，会在“你说什么？”前后念出“[uv_break]”

“见到你真高兴，弗洛梅女士。”我开口道。她的手暖暖的，不过握手的时间略微有些长了。“这难道不激动人心吗？”她深吸一口气。
“你说什么？”

The text was updated successfully, but these errors were encountered:

haijd · 2024-09-08T16:32:21Z

发现如果有特殊字符，就会出现这种情况。

修改文件uilib/zh_normalization/text_normlization.py，将其中的：

text = re.sub(r'[——《》【】<>{}()（）#&@“”^|…\\]', '', text)

改为：

text = re.sub(r'[——《》【】<>{}()（）#&@“”^|…？・\\]', '', text)

然后就可以了。

haijd · 2024-09-09T00:39:32Z

又完善了一下处理方法：


--- a/uilib/utils.py
+++ b/uilib/utils.py
@@ -143,7 +143,7 @@ def remove_brackets(text):
     text=re.sub(r'\[(uv_break|laugh|lbreak|break)\]',r' \1 ',text,re.I|re.S|re.M)

     # 使用 re.sub 替换掉 [ ] 对
-    newt=re.sub(r'\[|\]|！|：|｛|｝', '', text)
+    newt=re.sub(r'\[|\]|｛|｝', '', text)
     return    re.sub(r'\s(uv_break|laugh|lbreak|break)(?=\s|$)', r' [\1] ', newt)

--- a/uilib/zh_normalization/text_normlization.py
+++ b/uilib/zh_normalization/text_normlization.py
@@ -98,6 +98,8 @@ class TextNormalizer():
         if lang == "zh":
             #text = text.replace(" ", "")
             # 过滤掉特殊字符
+            text = re.sub(r'[—＿・：…]', '，', text)
+            text = re.sub(r'[？！]', '。', text)
             text = re.sub(r'[——《》【】<>{}()（）#&@“”^|…\\]', '', text)
         text = self.SENTENCE_SPLITOR.sub(r'\1\n', text)
         text = text.strip()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

“[uv_break]”没有处理 #240

“[uv_break]”没有处理 #240

haijd commented Sep 8, 2024

haijd commented Sep 8, 2024

haijd commented Sep 9, 2024

“[uv_break]”没有处理 #240

“[uv_break]”没有处理 #240

Comments

haijd commented Sep 8, 2024

haijd commented Sep 8, 2024

haijd commented Sep 9, 2024