Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分词错误 #120

Open
wencan opened this issue Oct 15, 2023 · 1 comment
Open

分词错误 #120

wencan opened this issue Oct 15, 2023 · 1 comment

Comments

@wencan
Copy link

wencan commented Oct 15, 2023

SnowNLP('本书由百度官方出品,百度公司CTO王海峰博士作序,张钹院士、李未院士、百度集团副总裁吴甜联袂推荐。').words
输出:
['本书', '由', '百度', '官方', '出品', ',', '百度', '公司', 'CTO', '王', '海峰', '博士', '作', '序', ',', '张', '钹', '院士', '、', '李', '未', '院士', '、', '百度', '集团', '副', '总裁', '吴', '甜', '联袂', '推荐', '。']

@lanchengkai
Copy link

lanchengkai commented Jan 16, 2024

我也遇到类似的问题,语料库里有对应的词,但是分词会出现不存在的词。例如 “我为何愿意承担经营风险”输出[“我为”,“何愿”,"意","承担","经营风险"], 其中“我为”,“何愿”,“经营风险”三个词在语料库中不存在

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants