-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train new language #18
Comments
You can certainly do that. In the end, the models in this repo will learn a token ids -> mel frames mapping; independent of the language. In order to train on some dataset, you will have to make a data loader that maps your text to token ids, as it is done for Modern Standard Arabic in this repo. The Arabic Speech Corpus has around 2 hours and I sampled 30-60 minutes per speaker for the multi-speaker model. In my experience it is usually better to have 10+ hours for the prosody, but that will also depend io the quality of the audio files. So far, I have only trained on diacritized text. I assume that it is possible for these models to learn the diacritization, but I haven't tried so far since I don't know any good quality dataset for that. Of course, it is possible to train a model with diacritized text, sample audio files for diacritized text, remove the diacrits and train on that. |
Thanks a lot for the comment!
Let me know if that process sounds good:
From here I'm not sure. how do I convert the cleaned voweled text into token IDS? Can I use the repo source code for the training or it's too different when using different language? Regarding Also, did you trained it from scratch or used pre trained English model? |
Can I use this repo for training new tts model in another language?
How much hours of audio + transcripts do I need?
Does the text should have diacritical signs?
The text was updated successfully, but these errors were encountered: