Replies: 19 comments
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> manuel3265 |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
>>> manuel3265 |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01 |
Beta Was this translation helpful? Give feedback.
-
>>> manuel3265 |
Beta Was this translation helpful? Give feedback.
-
>>> carlfm01
[October 2, 2019, 4:09am]
Hello, just to share my results.I'm stopping at 47 k steps for tacotron
2:
The gaps seems normal for my data and not affecting the performance.
As reference for others: slash
(931).png.filename}[1409×625 82.1
Final audios: slash
(feature-23 is a mouth twister) slash
47k.zip
(1,0 MB)
Experiment with new LPCNet model:
speech.wav with the old model (male)
female voice, audio generated with real speech features. 600k steps
with 14h of voice. slash
test.zip
(1,1 MB)
It was a surprise for me to see the male voice model generates female
voice.
Now about training speed: slash
My first model took 3h/epoch with 50h of data using a V100. (Trained for
10 epoch)
Now the new female model with 14h of speech took 30min epoch.
> Epoch 1 333333/333333 [==============================] - 1858s 6ms/step - loss: 3.2461
> Epoch 2 9536/333333 [..............................] - ETA: 29:54 - loss: 3.2475
It uses
CuDNNGRU
so is really fast to train, yes the V100 is prettyfast but most of the speed comes from the optimized CuDNN.
Of couse I'll share the models, as always.
[No output on generating voice
[This is an archived TTS discussion thread from discourse.mozilla.org/t/final-results-lpcnet-tacotron2-spanish]
Beta Was this translation helpful? Give feedback.
All reactions