Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“[uv_break]”没有处理 #240

Open
haijd opened this issue Sep 8, 2024 · 2 comments
Open

“[uv_break]”没有处理 #240

haijd opened this issue Sep 8, 2024 · 2 comments

Comments

@haijd
Copy link

haijd commented Sep 8, 2024

使用以下文本时,会在“你说什么?”前后念出“[uv_break]”

“见到你真高兴,弗洛梅女士。”我开口道。她的手暖暖的,不过握手的时间略微有些长了。“这难道不激动人心吗?”她深吸一口气。
“你说什么?”
@haijd
Copy link
Author

haijd commented Sep 8, 2024

发现如果有特殊字符,就会出现这种情况。

修改文件uilib/zh_normalization/text_normlization.py,将其中的:

text = re.sub(r'[——《》【】<>{}()()#&@“”^|…\\]', '', text)

改为:

text = re.sub(r'[——《》【】<>{}()()#&@“”^|…?・\\]', '', text)

然后就可以了。

@haijd
Copy link
Author

haijd commented Sep 9, 2024

又完善了一下处理方法:


--- a/uilib/utils.py
+++ b/uilib/utils.py
@@ -143,7 +143,7 @@ def remove_brackets(text):
     text=re.sub(r'\[(uv_break|laugh|lbreak|break)\]',r' \1 ',text,re.I|re.S|re.M)

     # 使用 re.sub 替换掉 [ ] 对
-    newt=re.sub(r'\[|\]|!|:|{|}', '', text)
+    newt=re.sub(r'\[|\]|{|}', '', text)
     return    re.sub(r'\s(uv_break|laugh|lbreak|break)(?=\s|$)', r' [\1] ', newt)

--- a/uilib/zh_normalization/text_normlization.py
+++ b/uilib/zh_normalization/text_normlization.py
@@ -98,6 +98,8 @@ class TextNormalizer():
         if lang == "zh":
             #text = text.replace(" ", "")
             # 过滤掉特殊字符
+            text = re.sub(r'[—_・:…]', ',', text)
+            text = re.sub(r'[?!]', '。', text)
             text = re.sub(r'[——《》【】<>{}()()#&@“”^|…\\]', '', text)
         text = self.SENTENCE_SPLITOR.sub(r'\1\n', text)
         text = text.strip()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant