加载个性化分词增量模型后的分词结果不正确？ #93

logic-tao · 2017-06-29T16:12:34Z

请问是哪里出问题了？描述如下：
环境：
ubuntu 16.04; python3.5;
pyltp:0.1.9.1; otcws:3.4.0; base_model:3.4.0.

实验过程：
（1）基于基础模型cws.model，训练集文件test.txt，开发集文件test_copy.txt，构建增量模型add.model。
其中，test.txt和test_copy.txt完全一样，单词之间用一个空格分开，整个长文本为一行。

（2）基于基础模型cws.model和增量模型add.model对utf-8格式的原文本进行分词，发现没有效果（只分出来少数几个词）

markwwen · 2017-07-04T04:13:03Z

+1

liu946 · 2017-07-04T06:36:00Z

pyltp 将近期更新。目前还不支持3.4.0模型。请使用3.3.2之前的模型版本。

Tracy6465 · 2017-08-16T09:34:55Z

我按照你这种方式训练增量模型后，现在不分词了，都是整段输出

hitweijinlong · 2017-12-12T09:57:21Z

模型 3.4.0 otcws3.4.0 otcws customized-learn 其中 --baseline-model 指定为cws.model ，训练出来增量模型通过CustomizedSegmentor加载之后传一个句子进去分词，返回的是整个句子,并没有进行分词处理，用otcws learn 指定 --dump-details true 训练自己的模型文件，再通过 otcws customized-learn 训练增量模型啊发现可以正常工作，请问是否 --baseline-model 不能指定为cws.model ？能否有解决方案，支持增加模型可成功执行

AacidChan · 2018-04-10T04:54:54Z

用otcws个性化训练了一个增量模型，使用otcws的customized-test预测正常，但是使用pyltp的CustomizedSegmenter().load_with_lexicon(base_path, incremental_path, lexicon_path)之后发现分出来的词基本都是单字 @liu946

liu946 added the duplicate label Jan 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

加载个性化分词增量模型后的分词结果不正确？ #93

加载个性化分词增量模型后的分词结果不正确？ #93

logic-tao commented Jun 29, 2017

markwwen commented Jul 4, 2017

liu946 commented Jul 4, 2017

Tracy6465 commented Aug 16, 2017

hitweijinlong commented Dec 12, 2017

AacidChan commented Apr 10, 2018

加载个性化分词增量模型后的分词结果不正确？ #93

加载个性化分词增量模型后的分词结果不正确？ #93

Comments

logic-tao commented Jun 29, 2017

markwwen commented Jul 4, 2017

liu946 commented Jul 4, 2017

Tracy6465 commented Aug 16, 2017

hitweijinlong commented Dec 12, 2017

AacidChan commented Apr 10, 2018