《人民日报》爬虫 #2153
《人民日报》爬虫
#2153
Replies: 5 comments 15 replies
-
@banned-historical-archives 1946-1976.10 解析出了 1.44 GB 的数据,五十多万篇,如何处理 |
Beta Was this translation helpful? Give feedback.
1 reply
-
这个数据量构建肯定超过6小时了,只能在本地部署时解析人民日报 |
Beta Was this translation helpful? Give feedback.
5 replies
-
@banned-historical-archives 发现一些四人的讲话,在人民日报里有但是目前没有收录 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@CiangCing14
请求帮助制作爬虫解析《人民日报》数据
有两处数据源:
https://www.laoziliao.net/rmrb/
https://cn.govopendata.com/renminribao/
就数据源数据质量来看,解析出的单篇文稿只要求三个字段:title、date 和 content,不需要 author(难以解析)
@banned-historical-archives
Beta Was this translation helpful? Give feedback.
All reactions