Can espeak transcription cause training testing difficulties #61

JRMeyer · 2021-03-07T08:01:22Z

JRMeyer
Mar 7, 2021
Maintainer

>>> vcjobacc
[July 8, 2019, 8:37pm]

## Hello all. I'm training a model for Russian. And I just can't make it good enough even for a simple phrases. Actually I do it for a second time. The first dataset had around 24 hours of audio book speech, which I replaced now with 6 hours (for now, it is growing each day, I know, 6 hours is not sufficient amount), as the reader tries to mimic different voices from time to time. Now it is perfect, single speaker, monotonic speech. Duration is from 1 to 11 sec. No long silences and other obvious reasons for the bad learning. My question is, when I tried for the first time, the model barely could say few phrases among a number of them by 200k step (batch size 32). Now, I learned the model for 190 000 steps (batch size 8), but it still can say nothing. TestAudios sound like a different language to me, I can not understand the sentences. The model is not overfitted. My concern is, the eSpeak transcription. May it be the reason for such a bad model behavior? Just take a look at transcription it provides while training: sentence = 'я тебя не понимаю' output on JupyterNotebook while testing: slash ^(en)aɪə tɛbiə niː pɒnɪmeɪuː(ru) slash ~

Why there is (en) ? Shouldn't it use only (ru) note? slash
and yeah, I see sometimes phrase 'full dictionary is not installed for
Ru' (eSpeak has no full dictionary for Russian language). slash
I think, if the eSpeak is used for both training and testing, it
provides in both cases the same transcription, so it seems like it
doesn't matter if transcription is correct as long as it is stable. But
though, I am worry a lot about it. I don't know other reasons that could
cause such a bad training. slash
I tried LJSpeech, and the model became good enough for understanding
after around 30 000 steps... slash
Need your advices, guys. Parameters seem to be good for dataset, maybe
be there is anything else I can try? slash
Thank you a lot for any suggestion.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/can-espeak-transcription-cause-training-testing-difficulties]

JRMeyer · 2021-03-07T08:01:24Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[July 9, 2019, 4:39pm]

try not to use phonemes and see if it is any different

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:01:27Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> geneing
[July 10, 2019, 5:41pm]

Since Russian language is almost perfectly phonetic, there should be
little need for using phonetic transcriptions. The only difficult point
is word stress.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:01:29Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> vcjobacc
[July 15, 2019, 3:10pm]

Thank you for the advice! slash
I actually figured out the reason why eSpeak produces such a weird on my
opinion transcription: I didn't fix the symbols.py and cleaners.py.
So my text was forced to be ascii. Now I made my own method for throwing
out all Latin symbols (as in Russian there are only Cyrillic symbols),
lower case it and so on. So now eSpeak produces the way more reasonable
transcription. slash
The only consern left is attention alignments... Here below is the pic
of 11k iteration (I just started training from scratch on slash ~10.5 Hours
ds, I decided not to trim silence as I have it really short, batch size
is 8 for training and 4 for testing). slash

slash
I drew the red dotted line as it is supposed to be (as I suppose), so
now I have it wrong. Will it most likely learn attention better? Or what
could be the reason? I saw some info about stop_token, that I could
tweak it a bit. But I think it would take a lot of time... slash
Should I leave it as it is and wait until the magic happened? Or I
should stop here and reduce stop_token(increase?). slash
Thank you so much! slash
Have a good one!

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:01:32Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[July 16, 2019, 9:04am]

As far as I can see,you
use use_forward_attn=True. If this is the case, let it to train more.
If nothing changes, then try to
use_forward_attn=false and location_attn=True. One of these should
work. If not, let us know again so we can discuss further.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:01:35Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> vcjobacc
[July 27, 2019, 9:58am]

Thank you. I've tried to wait more with forward_attention = true and
with location_att = true instead. Unfortunatly it didn't work. slash
Now I have extendet the dataset (its slash ~ 16 hours), as well as cleaning
it more, so I am preatty sure its cleaness is around 97% or so, with no
serious error. slash
The thing is, besides the attention, it couldn't produce any word
properly, not even anyhow close... slash
What I suspect is that the most of the sentences in the dataset are
ended in the middle, so according to the intonation, while the sentence
seems to be continued, it is ended (that's how I've tried to insure I
have no files longer than 10 secs). slash
If that's the serious misstake, what duration is acceptable for files?
Should I always use only the sentences with correct intonation? slash
Thank you a lot! slash
p.s. I attached some plots just in case it is needed.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:01:37Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> vcjobacc
[July 27, 2019, 8:06pm]

Looks like that was my serious mistake to split sentences in the middle.
I am training now with another ds, where almost all sentences are ended
on the right spot, and by 20000 steps it is already started to align
fine! And the sound it produces sounds promising! Thank you for all
advices, hope now it will work!

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:01:40Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> snake
[March 21, 2020, 6:48am]

Have you succeeded? I'm going to train a model for russian as well.
Going to use common voice dataset. Any tips for config.json?

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can espeak transcription cause training testing difficulties #61

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can espeak transcription cause training testing difficulties #61

JRMeyer Mar 7, 2021 Maintainer

Replies: 7 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author