Skip to content

OpenVoice emotion style transfer? #322

Answered by eginhard
DrewThomasson asked this question in Q&A
Discussion options

You must be logged in to vote

There is a lot of marketing speech in that repo so it is not immediately obvious how it works unfortunately. There are actually two separate components:

  1. "Base speaker TTS"
  2. "Tone color converter"

The first can be any TTS system that gives you an audio output. They mostly use their own MeloTTS, which is based on Vits.

The second is just a separate voice conversion model that takes the TTS output and a reference speaker audio and returns the converted speech. But I guess "tone color converter" sounds fancier... These are the actual OpenVoice VC models (v1 and v2) that are added in Coqui, so you can use them with any Coqui TTS model.

In that notebook they use a single-speaker TTS model that…

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by eginhard
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants