Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

你好,hightman,请问下我使用PHP添加自定义词组时,报错? #56

Open
nottellyou opened this issue Oct 18, 2018 · 6 comments

Comments

@nottellyou
Copy link

nottellyou commented Oct 18, 2018

692 $so = scws_new();
693 $so->set_charset('utf8');
694 // 这里没有调用 set_dict 和 set_rule 系统会自动试调用 ini 中指定路径下的词典和规则文件
695 //$dictPath = ini_get('scws.default.fpath').'/dict.utf8.xdb';
696 //$so->set_dict($dictPath);//设置词典
697
698 //$so->set_dict('/usr/local/scws/etc/dict.utf8.xdb');
699 $so->add_dict('/usr/local/scws/etc/dict.user.txt');
700 //$so->set_rule('/usr/local/scws/etc/rules.utf8.ini');
701
702 $so->set_duality(true);//设定是否将闲散文字自动以二字分词法聚合。
703 $so->set_ignore(true);//设定分词返回结果时是否去除一些特殊的标点符号之类。
704 $so->set_multi(1);//按位异或的 1 | 2 | 4 | 8 分别表示: 短词 | 二元 | 主要单字 | 所有单字
705
706 $so->send_text("我是一个中国人,我会C++语言,我也有很多T恤衣服,我的衣服比我还重老司机遇上新能源遇上新能源这个分词怎么分");
707 echo '<pre>';
708 //$tmp = $so->get_result();
709 //$tmp = $so->get_tops(6, '~V');
710 $tmp = $so->get_tops(7);
711 foreach($tmp as $v)
712 {
713 print_r($v);
714 }
715 $so->close();

总是在 报699行 $so->add_dict('/usr/local/scws/etc/dict.user.txt'); 错误,我想添加一些自定义的词组:老司机。

请问是哪里出了问题呢?

谢谢

@nottellyou
Copy link
Author

知道了,是加一个SCWS_XDICT_TXT参数就OK了。

@nottellyou
Copy link
Author

再问一个问题:怎样去掉一些语气助词还有某些不可能用的词:

Array
(
[word] => 收入
[times] => 4
[weight] => 19.559999465942
[attr] => n
)
Array
(
[word] => 可以
[times] => 4
[weight] => 18.680000305176
[attr] => v
)
Array
(
[word] => 返利
[times] => 2
[weight] => 16.979999542236
[attr] => v
)
Array
(
[word] => 不仅
[times] => 3
[weight] => 14.849999427795
[attr] => c
)
Array
(
[word] => 也许
[times] => 3
[weight] => 14.819999694824
[attr] => d
)
Array
(
[word] => 他们
[times] => 3
[weight] => 14.760000228882
[attr] => r
)
Array
(
[word] => 拥有
[times] => 3
[weight] => 14.700000762939
[attr] => v
)
Array
(
[word] => 优惠
[times] => 3
[weight] => 14.549999237061
[attr] => vn
)
Array
(
[word] => 如果
[times] => 3
[weight] => 14.460000991821
[attr] => c
)
Array
(
[word] => 财富
[times] => 3
[weight] => 14.400000572205
[attr] => n
)
Array
(
[word] => 消费
[times] => 3
[weight] => 14.130000114441
[attr] => vn
)
Array
(
[word] => 自己
[times] => 3
[weight] => 13.650000572205
[attr] => r
)

像这篇文章分词结果中的:如果、自己、不仅、也许、他们……排除掉呢???

@hightman
Copy link
Owner

hightman commented Oct 18, 2018 via email

@nottellyou
Copy link
Author

nottellyou commented Oct 18, 2018

我在词性里加入了:$tmp = $so->get_tops(100, '~v,~d,~y,~e,~r,~a'); 没用。
Array
(
[word] => 不仅
[times] => 3
[weight] => 14.849999427795
[attr] => c
)
Array
(
[word] => 也许
[times] => 3
[weight] => 14.819999694824
[attr] => d
)
Array
(
[word] => 他们
[times] => 3
[weight] => 14.760000228882
[attr] => r
)
Array
(
[word] => 如果
[times] => 3
[weight] => 14.460000991821
[attr] => c
)
Array
(
[word] => 财富
[times] => 3
[weight] => 14.400000572205
[attr] => n
)
Array
(
[word] => 消费
[times] => 3
[weight] => 14.130000114441
[attr] => vn
)
Array
(
[word] => 自己
[times] => 3
[weight] => 13.650000572205
[attr] => r
)
需要大侠指点一下, 哪里设置的不对?

@hightman
Copy link
Owner

hightman commented Oct 18, 2018 via email

@nottellyou
Copy link
Author

请问scws词性和这里的词性https://blog.csdn.net/leiting_imecas/article/details/68484811?utm_source=blogxgwz1 一样吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants