Replies: 7 comments
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> geneing |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc |
Beta Was this translation helpful? Give feedback.
-
>>> snake |
Beta Was this translation helpful? Give feedback.
-
>>> vcjobacc
[July 8, 2019, 8:37pm]
## Hello all. I'm training a model for Russian. And I just can't make it good enough even for a simple phrases. Actually I do it for a second time. The first dataset had around 24 hours of audio book speech, which I replaced now with 6 hours (for now, it is growing each day, I know, 6 hours is not sufficient amount), as the reader tries to mimic different voices from time to time. Now it is perfect, single speaker, monotonic speech. Duration is from 1 to 11 sec. No long silences and other obvious reasons for the bad learning. My question is, when I tried for the first time, the model barely could say few phrases among a number of them by 200k step (batch size 32). Now, I learned the model for 190 000 steps (batch size 8), but it still can say nothing. TestAudios sound like a different language to me, I can not understand the sentences. The model is not overfitted. My concern is, the eSpeak transcription. May it be the reason for such a bad model behavior? Just take a look at transcription it provides while training: sentence = 'я тебя не понимаю' output on JupyterNotebook while testing: slash ^(en)aɪə tɛbiə niː pɒnɪmeɪuː(ru) slash ~
Why there is (en) ? Shouldn't it use only (ru) note? slash
and yeah, I see sometimes phrase 'full dictionary is not installed for
Ru' (eSpeak has no full dictionary for Russian language). slash
I think, if the eSpeak is used for both training and testing, it
provides in both cases the same transcription, so it seems like it
doesn't matter if transcription is correct as long as it is stable. But
though, I am worry a lot about it. I don't know other reasons that could
cause such a bad training. slash
I tried LJSpeech, and the model became good enough for understanding
after around 30 000 steps... slash
Need your advices, guys. Parameters seem to be good for dataset, maybe
be there is anything else I can try? slash
Thank you a lot for any suggestion.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/can-espeak-transcription-cause-training-testing-difficulties]
Beta Was this translation helpful? Give feedback.
All reactions