How can I synthesize my own text to speech? #11

ghost · 2019-11-26T04:04:58Z

No description provided.

binarythinktank · 2019-11-27T06:56:15Z

I was wondering this too. Iv successfully trained it and got good samples, but how do I tts using this output?

ViktorIgeland · 2019-11-29T17:32:00Z

Neural text-to-speech is most of the time done in two steps: feature prediction and voice synthesis.
First you have a feature predictor that transforms the text into some features e.g. a mel-spectrogram. These features can then be used by the synthesizer to generate audio.

MelGAN is a synthesizer so to go from text to speech you would need to combine it with a model that converts text into mel-spectrograms. One such model is e.g. Tacotron2, have a look at: https://github.com/NVIDIA/tacotron2

Wenqikry · 2019-12-10T10:47:06Z

hi,@ViktorIgeland.
In tacotron2 + melgan, the way tacotron2 extracts the mel spectrum is different from the way melgan is. Will it affect the results?

ViktorIgeland · 2019-12-11T18:43:42Z

Hi @Wenqikry,
Yes, if your models are trained on different types of spectrograms it will have an impact on the results. If you don't need the speed of MelGAN you can try using Nvidia's WaveGlow, as it's trained on the same spectrogram as their Tacotron2.

Wenqikry · 2019-12-12T00:22:05Z

@ViktorIgeland，
okay,thanks, I will try it.

casper-hansen · 2019-12-31T22:57:11Z

Hi @Wenqikry,
Yes, if your models are trained on different types of spectrograms it will have an impact on the results. If you don't need the speed of MelGAN you can try using Nvidia's WaveGlow, as it's trained on the same spectrogram as their Tacotron2.

So how can we use MelNet with the same performance, i.e. how can we reproduce the results of the paper. Do you know if this is possible? And then extend it to custom audio files?

Do we have any information on how these mel-scale spectrograms are generated? Something we can reproduce and use in MelNet.

casper-hansen · 2020-01-03T22:07:45Z

@Wenqikry did you figure out a good way to produce mel spectrograms?

Wenqikry · 2020-01-04T01:54:27Z

@casperbh96 Sorry, I haven't found it yet

Liujingxiu23 · 2020-04-17T08:36:38Z

@Wenqikry Have you tried https://github.com/Rayhane-mamah/Tacotron-2 or https://github.com/NVIDIA/tacotron2 to train log-mels? Combine with the Melgan?
Do you have any experiences?
I used https://github.com/Rayhane-mamah/Tacotron-2 , change feat to log-mel as this repo, not do clip_out, but the result is very bad, I cat not find any wrong...

Wenqikry · 2020-04-17T11:38:42Z

@Liujingxiu23 Sorry,i haven't tried...

Mariaa98 · 2020-05-08T19:23:47Z

I trained the model well on a dataset,
now I want to give him a Mel spectrogram as an input to synthesis the audio.
I looked at the log folder, I found many .pt files.

Anyone can help?

binarythinktank · 2020-05-09T02:27:47Z

@Mariaa98 if you figure it out let me know, i have tried with 3 different data scientists and none of them could get a functional TTS script from this. we ended up going with a different model.

BuaaAlban · 2020-06-18T07:16:04Z

@Wenqikry Have you tried https://github.com/Rayhane-mamah/Tacotron-2 or https://github.com/NVIDIA/tacotron2 to train log-mels? Combine with the Melgan?
Do you have any experiences?
I used https://github.com/Rayhane-mamah/Tacotron-2 , change feat to log-mel as this repo, not do clip_out, but the result is very bad, I cat not find any wrong...

I have got some results by tacotron2 and melgan, I can figure out what the wav say , but it's not good as the demos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I synthesize my own text to speech? #11

How can I synthesize my own text to speech? #11

ghost commented Nov 26, 2019

binarythinktank commented Nov 27, 2019

ViktorIgeland commented Nov 29, 2019

Wenqikry commented Dec 10, 2019

ViktorIgeland commented Dec 11, 2019

Wenqikry commented Dec 12, 2019

casper-hansen commented Dec 31, 2019 •

edited

Loading

casper-hansen commented Jan 3, 2020

Wenqikry commented Jan 4, 2020

Liujingxiu23 commented Apr 17, 2020

Wenqikry commented Apr 17, 2020

Mariaa98 commented May 8, 2020

binarythinktank commented May 9, 2020

BuaaAlban commented Jun 18, 2020

How can I synthesize my own text to speech? #11

How can I synthesize my own text to speech? #11

Comments

ghost commented Nov 26, 2019

binarythinktank commented Nov 27, 2019

ViktorIgeland commented Nov 29, 2019

Wenqikry commented Dec 10, 2019

ViktorIgeland commented Dec 11, 2019

Wenqikry commented Dec 12, 2019

casper-hansen commented Dec 31, 2019 • edited Loading

casper-hansen commented Jan 3, 2020

Wenqikry commented Jan 4, 2020

Liujingxiu23 commented Apr 17, 2020

Wenqikry commented Apr 17, 2020

Mariaa98 commented May 8, 2020

binarythinktank commented May 9, 2020

BuaaAlban commented Jun 18, 2020

casper-hansen commented Dec 31, 2019 •

edited

Loading