Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

这个模型是声音用whisper转成 声音token,用token和LLM chat得到回答的 文字token, 最后将这个回答的文字token 用SNAC转换成声音? #116

Open
ssdutliuhaibo opened this issue Oct 30, 2024 · 3 comments

Comments

@ssdutliuhaibo
Copy link

No description provided.

@mini-omni
Copy link
Contributor

回答过程中,文字和声音是同步生成的,会同时生成text token及audio token(SNAC token),然后audio teken经过SNAC decoder生成audio wav.

@Jasper-sudo-Sun
Copy link

回答过程中,文字和声音是同步生成的,会同时生成text token及audio token(SNAC token),然后audio teken经过SNAC decoder生成audio wav.

想问训练过程也是这么做的吗?

@mini-omni
Copy link
Contributor

@Jasper-sudo-Sun 是的,训练过程中也是同时生成text token和audio token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants