Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch_20newsgroups下载可能遇到的问题 #40

Open
emperor239 opened this issue Dec 3, 2024 · 0 comments
Open

fetch_20newsgroups下载可能遇到的问题 #40

emperor239 opened this issue Dec 3, 2024 · 0 comments

Comments

@emperor239
Copy link

fetch_20newsgroups下载速度巨慢,所以我在这里介绍一个方法。
1、从http://qwone.com/~jason/20Newsgroups/上面找到Data然后再找到20news-bydate.tar.gz ,然后下载
2、下载完了以后放到C:\Users\一串数字\scikit_learn_data\20news_home目录下
3、到目录C:\Users\一串数字\AppData\Local\Programs\Python\Python37\Lib\site-packages\sklearn\datasets下找到_twenty_newsgroups.py或twenty_newsgroups.py打开
4、将

logger.info("Downloading dataset from %s (14 MB)", ARCHIVE.url)

 # archive_path = _fetch_remote(ARCHIVE, dirname=target_dir)
 注释掉,这两行代码就是下载数据的代码

然后添加
archive_path = os.path.join(target_dir, r'20news-bydate.tar.gz')
保存即可
5、运行程序等待,系统会自动解压20news-bydate.tar.gz文件然后删除,最终生成20news-bydate_py3.pkz文件
此时即可查看数据了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant