Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] reaching out for a contact, and to share some details about bark re: the tokens it uses #223

Open
Alignment-Lab-AI opened this issue Apr 1, 2024 · 1 comment
Labels
user question A question is asked by a user.

Comments

@Alignment-Lab-AI
Copy link

iirc, bark is an encoder decoder based on a t5, and uses wav2vec-bert as the encoder

i dont recall where i learned that but i feel like i validated this at some point, wav2vec-bert uses tokens which represent phonemes, those are converted to encodec codebooks which produce the final audio

id love to discuss some of our current work with tts, stt, and sts if you have the bandwidth! my email is [email protected]

more context about our org at https://AlignmentLab.ai

thanks!
Austin

@Alignment-Lab-AI Alignment-Lab-AI added the user question A question is asked by a user. label Apr 1, 2024
@gitmylo
Copy link
Owner

gitmylo commented Apr 1, 2024

When I was figuring out how bark works, I wrote down my observations in here

I came up with a few theoretical methods that would allow voice cloning, which would be correct if my observations were correct. They were, and I published the code.
The source code written related to "Method 2" can be found here
The source code for "Method 3" can probably be found in the commit history for audio-webui. But you might need to go really far back. This method's outputs are far less convincing and much lower quality than method 2. So this might not be as interesting, but it could still help explain how bark's 3 step process works, exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user question A question is asked by a user.
Projects
None yet
Development

No branches or pull requests

2 participants