-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,hightman,请问下我使用PHP添加自定义词组时,报错? #56
Comments
知道了,是加一个SCWS_XDICT_TXT参数就OK了。 |
再问一个问题:怎样去掉一些语气助词还有某些不可能用的词: Array 像这篇文章分词结果中的:如果、自己、不仅、也许、他们……排除掉呢??? |
自行根据词性排除
Best Regards
hightman/海鳗
**********************************
微信/微博:hightman
Github:https://github.com/hightman <http://github.com/hightman>
… 在 2018年10月18日,上午10:58,nottellyou ***@***.*** ***@***.***>> 写道:
再问一个问题:怎样去掉一些语气助词还有某些不可能用的词:
Array
(
[word] => 收入
[times] => 4
[weight] => 19.559999465942
[attr] => n
)
Array
(
[word] => 可以
[times] => 4
[weight] => 18.680000305176
[attr] => v
)
Array
(
[word] => 返利
[times] => 2
[weight] => 16.979999542236
[attr] => v
)
Array
(
[word] => 不仅
[times] => 3
[weight] => 14.849999427795
[attr] => c
)
Array
(
[word] => 也许
[times] => 3
[weight] => 14.819999694824
[attr] => d
)
Array
(
[word] => 他们
[times] => 3
[weight] => 14.760000228882
[attr] => r
)
Array
(
[word] => 拥有
[times] => 3
[weight] => 14.700000762939
[attr] => v
)
Array
(
[word] => 优惠
[times] => 3
[weight] => 14.549999237061
[attr] => vn
)
Array
(
[word] => 如果
[times] => 3
[weight] => 14.460000991821
[attr] => c
)
Array
(
[word] => 财富
[times] => 3
[weight] => 14.400000572205
[attr] => n
)
Array
(
[word] => 消费
[times] => 3
[weight] => 14.130000114441
[attr] => vn
)
Array
(
[word] => 自己
[times] => 3
[weight] => 13.650000572205
[attr] => r
)
像这篇文章分词结果中的:如果、自己、不仅、也许、他们……排除掉呢???
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#56 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAxlXeRlBpZ8ZMwRt4dXV6WIJ1PhJ7p8ks5ul-5CgaJpZM4XsgvD>.
|
我在词性里加入了:$tmp = $so->get_tops(100, '~v,~d,~y,~e,~r,~a'); 没用。 |
~v,d,y,e,r,a 而不是每个前面都加~
Best Regards
hightman/海鳗
**********************************
微信/微博:hightman
Github:https://github.com/hightman
… 在 2018年10月18日,下午12:09,nottellyou ***@***.***> 写道:
我在词性里加入了:$tmp = $so->get_tops(100, '~v,~d,~y,~e,~r,~a'); 没用。
Array
(
[word] => 不仅
[times] => 3
[weight] => 14.849999427795
[attr] => c
)
Array
(
[word] => 也许
[times] => 3
[weight] => 14.819999694824
[attr] => d
)
Array
(
[word] => 他们
[times] => 3
[weight] => 14.760000228882
[attr] => r
)
Array
(
[word] => 如果
[times] => 3
[weight] => 14.460000991821
[attr] => c
)
Array
(
[word] => 财富
[times] => 3
[weight] => 14.400000572205
[attr] => n
)
Array
(
[word] => 消费
[times] => 3
[weight] => 14.130000114441
[attr] => vn
)
Array
(
[word] => 自己
[times] => 3
[weight] => 13.650000572205
[attr] => r
)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#56 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAxlXTGEou7h1Vm8V8QDToRMokAAdcQGks5ul_7ygaJpZM4XsgvD>.
|
692 $so = scws_new();
693 $so->set_charset('utf8');
694 // 这里没有调用 set_dict 和 set_rule 系统会自动试调用 ini 中指定路径下的词典和规则文件
695 //$dictPath = ini_get('scws.default.fpath').'/dict.utf8.xdb';
696 //$so->set_dict($dictPath);//设置词典
697
698 //$so->set_dict('/usr/local/scws/etc/dict.utf8.xdb');
699 $so->add_dict('/usr/local/scws/etc/dict.user.txt');
700 //$so->set_rule('/usr/local/scws/etc/rules.utf8.ini');
701
702 $so->set_duality(true);//设定是否将闲散文字自动以二字分词法聚合。
703 $so->set_ignore(true);//设定分词返回结果时是否去除一些特殊的标点符号之类。
704 $so->set_multi(1);//按位异或的 1 | 2 | 4 | 8 分别表示: 短词 | 二元 | 主要单字 | 所有单字
705
706 $so->send_text("我是一个中国人,我会C++语言,我也有很多T恤衣服,我的衣服比我还重老司机遇上新能源遇上新能源这个分词怎么分");
707 echo '<pre>';
708 //$tmp = $so->get_result();
709 //$tmp = $so->get_tops(6, '~V');
710 $tmp = $so->get_tops(7);
711 foreach($tmp as $v)
712 {
713 print_r($v);
714 }
715 $so->close();
总是在 报699行 $so->add_dict('/usr/local/scws/etc/dict.user.txt'); 错误,我想添加一些自定义的词组:老司机。
请问是哪里出了问题呢?
谢谢
The text was updated successfully, but these errors were encountered: