PersonalDialog数据集的一部分数据现在可以通过huggingface的datasets库访问和使用:https://huggingface.co/datasets/silver/personal_dialog
from datasets import load_dataset
dataset = load_dataset("silver/personal_dialog")
本项目包含论文Personalized Dialogue Generation with Diversified Traits中构建数据集PersonalDialog时所使用的代码。
项目代码于2018-01-08 Fork自另外一个Repo,并在原代码的基础上修改完成。本仓库代码的最后修改时间为2018-04-21。
原代码库自2018年1月后的更新没有并入本代码库中。
使用方法请参照原项目。
- 添加爬取对话功能
- 添加代理
- 修复数据爬取中的一些问题,如表情,emoji等
- 关于PersonaDialog数据集的其他信息请联系 zhengyinhe1 at 163 dot com
The code in this project was used for constructing the PersonalDialog data set introduced in the paper Personalized Dialogue Generation with Diversified Traits
The codebase was forked from another Repo in 2018-01-08. The last modification of this Repo was at 2018-04-21.
The commits of the original repro that was submitted after Jan. 2018 were NOT merged to this Repo. However, you can still refer to the wiki of the original Repo to setup your spider.
- Add code to crawl dialogs on Weibo.
- Add code for using the proxy pool.
- Fix some problems in the crawling process. Such as Facial expressions, or emoji.
- Please contact zhengyinhe1 at 163 dot com for further assistants.