Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

請教idf的部分是如何產生的 #2

Open
babyandy0111 opened this issue Jun 11, 2018 · 3 comments
Open

請教idf的部分是如何產生的 #2

babyandy0111 opened this issue Jun 11, 2018 · 3 comments

Comments

@babyandy0111
Copy link

Hello, 接觸這部分沒有很深, 請問idf的檔案是如何產生的呢?

@gaussic
Copy link
Owner

gaussic commented Jun 12, 2018

IDF档案的生成来自于 gen_idf.py 脚本。

具体的算法请参考 tf-idf, Wikipedia

@babyandy0111
Copy link
Author

Hi @gaussic
我用了gen_idf.py 腳本產生idf, 但檔案出現的格式和原本提供的idf不太一樣
他出現了類似以下的編碼
0120 312e 300a 0020 312e 300a 0320 312e
300a 0220 312e 300a 0420 312e 300a 0820

我在segment.py 添加了
jieba.set_dictionary('./data/dict.txt.big') #jieba下載的
jieba.load_userdict('./data/keyword.txt') #隨意整理
jieba.analyse.set_stop_words('./data/stop_words.txt') #jieba下載的

這是正常的嗎?

@gaussic
Copy link
Owner

gaussic commented Jun 13, 2018

妳好,關於妳的問題,還請給出妳的運行環境。

  • 操作系統
  • Python 版本
  • 檔案編碼格式
  • 其他描述性信息

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants