Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

拼音分割算法 #9

Open
nopdan opened this issue Jan 21, 2024 · 1 comment
Open

拼音分割算法 #9

nopdan opened this issue Jan 21, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@nopdan
Copy link
Owner

nopdan commented Jan 21, 2024

在一些输入法里不支持带分隔符的拼音词库,只支持用户短语,比如微软的用户自定义短语,手机 Gboard 的个人词典,
这些编码为连续的拼音串 pinyinfengesuanfa
在以这种格式为源格式时,需要将其转换为带分隔符的编码 pin'yin'fen'ge'suan'fa
现在的方式是忽略原编码,而由程序自动注音,可能导致注音不准,而且效率低下
我们需要一个拼音分割算法,由 连续的拼音串(pinyinfengesuanfa) 和 词组(拼音分割算法) 进行分割。

对于有歧义的拆分,可以通过以下信息解决:

  • 词组的长度(xian 西安,两个字故取 xi'an)
  • 词组中每个字的可能的读音(guangan 广安,[guang'an, guan'gan],广没有guan音,故取1)
@nopdan nopdan added the enhancement New feature or request label Jan 21, 2024
@nopdan
Copy link
Owner Author

nopdan commented May 12, 2024

另一种思路,将每个字可能的读音做笛卡尔积,对比拼音串。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant