New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

stage2如何同时训练audio/text input #123

Open

vra opened this issue Nov 25, 2024 · 1 comment

vra commented Nov 25, 2024

Hi @mini-omni ,论文中提到stage2训练audio/text input (Stage 2 uses TextQA and AudioQA for audio/text input and text response training)

想问一下：

audio /text input是来自相同的数据集吗，还是audio input来自数据集A, text input来自数据集B?
训练时两种类型的输入是怎么组织的，一个batch里面包含一般的audio input, 一半的text input?
两种数据是在一个DataModule中实现的还是单独分开实现的呢？

提前感谢~

The text was updated successfully, but these errors were encountered:

Contributor

mini-omni commented Dec 4, 2024

文字描述有些不准确，stage 2我们在实验中主要是用ASR数据，以及text QA数据来训的，具体可以参考Table 1.
一个batch里的类型是随机的，整体的比例可以参考1：1。
我们是在同一个dataloader里随机采样

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment