Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于TUAB和TUEV数据集的预处理 #35

Open
bukun46 opened this issue Aug 3, 2024 · 3 comments
Open

关于TUAB和TUEV数据集的预处理 #35

bukun46 opened this issue Aug 3, 2024 · 3 comments

Comments

@bukun46
Copy link

bukun46 commented Aug 3, 2024

您好,感谢您的优秀工作!我最近在复现您的实验,在处理原始TUAB和TUEV数据到h5数据集时遇到一些问题,您提供的代码是读取.cnt文件而,make_TUAB.py 和 make_TUEV.py 处理后的文件是存到.pkl格式的,请问这个如何处理成h5 dataset呢?我理解pretrain 模型的输入都是h5 dataset格式的,提前感谢您的耐心回复!

@935963004
Copy link
Owner

make_TUAB.py和make_TUEV.py是用于处理下游任务数据集而不是预训练数据,make_h5dataset_for_pretrain.py才是用于处理预训练数据的

@bukun46
Copy link
Author

bukun46 commented Aug 12, 2024

谢谢您的耐心回答,请问pretrain的数据集有发布的计划吗?

@935963004
Copy link
Owner

我们使用的预训练大部分都是公开数据集,你可以从各个数据集的官网下载。我们没有权力擅自发布别人的数据集,有吃官司风险,希望你能理解

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants