Releases: coqui-ai/TTS
v0.0.15.1
v0.0.15
🐸 v0.0.15
🐞Bug Fixes
- Fix tb_logger init for rank > 0 processes in distributed training.
💾 Code updates
- Refactoring and optimization in the speaker encoder module. (:crown: @Edresson )
- Replacing
unidecode
withanyascii
- Japanese text to phoneme conversion. (:crown: @kaiidams)
- Japanese
tts
recipe to train Tacotron2-DDC on Kokoro dataset (:crown: @kaiidams)
🚶♀️ Operational Updates
- Start using
pylint == 2.8.3
- Reorg
tests
files. - Upload to pypi automatically on release.
- Move
VERSION
file underTTS
folder.
🏅 Model implementations
- New Speaker Encoder implementation based on https://arxiv.org/abs/2009.14153 (:crown: @Edresson )
🚀 New Pre-Trained Model Releases
- Japanese Tacotron model (:crown: @kaiidams)
💡 All the models below are available by tts
or tts-server
endpoints on CLI as explained here.
v0.0.14
🐸 v0.0.14
🐞Bug Fixes
- Remove breaking line from Tacotron models. (👑 @a-froghyra)
💾 Code updates
- BREAKING: Coqpit integration for config management and the first 🐸TTS recipe, for LJSpeech Check #476.
Every model now tied to a Python class that defines the configuration scheme. It provides a better interface and lets the user know better what are the default values, expected value types, and mandatory fields.
Specific model configs are defined under TTS/tts/configs
and TTS/vocoder/configs
. TTS/config/shared_configs.py
hosts configs that are shared by all the 🐸 TTS models. Configs shared by tts
models are hosted under TTS/tts/configs/shared_configs.py
and shared by vocoder
models are under TTS/vocoder/configs/shared_config.py
.
For example TacotronConfig
follows BaseTrainingConfig -> BaseTTSConfig -> TacotronConfig
.
- BREAKING: Remove
phonemizer
support due to License conflict.
This essentially deprecates the support for all the models using phonemes as input. Feel free to suggest in-place options if you are affected by this change.
- Start hosting 👩🍳 recipes under 🐸 TTS. The first recipe is for Tacotron2-DDC with LJspeech dataset under
TTS/recipes/
.
Please check here for more details.
v0.0.13
🐸 v0.0.13
🐞Bug Fixes
💾 Code updates
SpeakerManager
class for handling multi-speaker model management and interfacingspeaker.json
file.- Enabling multi-speaker models with
tts
andtts-server
endpoints. (:crown: @kirianguiller ) - Allow choosing a different
noise scale
for GlowTTS at inference. - Glow-TTS updates to import SC-Glow Models.
- Fixing windows support (:crown: @WeberJulian )
🚶♀️ Operational Updates
- Refactoring 🐸 TTS installation and allow selecting different scopes (
all, tf, notebooks
)for installation depending on the specific needs.
🏅 Model implementations
🚀 New Pre-Trained Model Releases
- SC-GlowTTS multi-speaker English model from our work https://arxiv.org/abs/2104.05557 (:crown: @Edresson )
- HiFiGAN vocoder finetuned for the above model.
- Tacotron DDC Non-Binary English model using Accenture's Sam dataset.
- HiFiGAN vocoder trained for the models above.
Released Models
💡 All the models below are available by tts
or tts-server
endpoints on CLI as explained here.
Models with ✨️ below are new with this release.
- SC-GlowTTS model is from our latest paper in a collaboration with @Edresson and @mueller91.
- The new non-binary TTS model is trained using the SAM dataset from Accenture Labs. Check out their blog post
Language | Dataset | Model Name | Model Type | TTS version | Download |
---|---|---|---|---|---|
✨ English (non-binary) | sam (acccenture) | Tacotron2-DDC | tts | 😄 v0.0.13 | 💾 |
✨ English (multi-speaker) | VCTK | SC-GlowTTS | tts | 😄 v0.0.13 | 💾 |
English | LJSpeech | Tacotron-DDC | tts | v0.0.12 | 💾 |
German | Thorsten-DE | Tacotron-DCA | tts | v0.0.11 | 💾 |
German | Thorsten-DE | Wavegrad | vocoder | v0.0.11 | 💾 |
English | LJSpeech | SpeedySpeech | tts | v0.0.10 | 💾 |
English | EK1 | Tacotron2 | tts | v0.0.10 | 💾 |
Dutch | MAI | TacotronDDC | tts | v0.0.10 | 💾 |
Chinese | Baker | TacotronDDC-GST | tts | v0.0.10 | 💾 |
English | LJSpeech | TacotronDCA | tts | v0.0.9 | 💾 |
English | LJSpeech | Glow-TTS | tts | v0.0.9 | 💾 |
Spanish | M-AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
French | M_AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
Dutch | MAI | TacotronDDC | tts | v0.0.10 | 💾 |
✨ English | sam (accenture) | HiFiGAN | vocoder | 😄 v0.0.13 | 💾 |
✨ English | VCTK | HiFiGAN | vocoder | 😄 v0.0.13 | 💾 |
English | LJSpeech | HiFiGAN | vocoder | v0.0.12 | 💾 |
English | EK1 | WaveGrad | vocoder | v0.0.10 | 💾 |
Dutch | MAI | ParallelWaveGAN | vocoder | v0.0.10 | 💾 |
English | LJSpeech | MB-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | FullBand-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | WaveGrad | vocoder | v0.0.9 | 💾 |
Update Jun 7 2021: Ruslan (Russian) model has been removed due to the license conflict.
v0.0.12
🐸 v0.0.12
🐞Bug Fixes
💾 Code updates
- Enable logging model config.json on Tensorboard. #418
- Update code style standards and use a
Makefile
to ease regular tasks. #423 - Enable using
Tacotron.prenet.dropout
at inference time. This leads to a better quality with some models. - Update default
tts
model to LJspeech TacotronDDC. - Show the real waveform on Tensorboard in GAN vocoder training.
🚶♀️ Operational Updates
🏅 Model implementations
- initial HiFiGAN implementation (:crown: @rishikksh20 @erogol) #422
🚀 New Pre-Trained Model Releases
-
Universal HifiGAN model(postponed to the next version for 👑 @Edresson's updated model.) - LJSpeech, Tacotron2 Double Decoder Consistency v2 model.
Check our blog post to learn more about Double Decoder Consistency. - LJSpeech HifiGAN model.
Released Models
💡 All the models below are available by tts
end point as explained here.
Language | Dataset | Model Name | Model Type | TTS version | Download |
---|---|---|---|---|---|
✨ English | LJSpeech | Tacotron-DDC | tts | 😃 v0.0.12 | 💾 |
German | Thorsten-DE | Tacotron-DCA | tts | v0.0.11 | 💾 |
German | Thorsten-DE | Wavegrad | vocoder | v0.0.11 | 💾 |
English | LJSpeech | SpeedySpeech | tts | v0.0.10 | 💾 |
English | EK1 | Tacotron2 | tts | v0.0.10 | 💾 |
Dutch | MAI | TacotronDDC | tts | v0.0.10 | 💾 |
Chinese | Baker | TacotronDDC-GST | tts | v0.0.10 | 💾 |
English | LJSpeech | TacotronDCA | tts | v0.0.9 | 💾 |
English | LJSpeech | Glow-TTS | tts | v0.0.9 | 💾 |
Spanish | M-AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
French | M_AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
Dutch | MAI | TacotronDDC | tts | v0.0.10 | 💾 |
✨ English | LJSpeech | HiFiGAN | vocoder | 😃 v0.0.12 | 💾 |
English | EK1 | WaveGrad | vocoder | v0.0.10 | 💾 |
Dutch | MAI | ParallelWaveGAN | vocoder | v0.0.10 | 💾 |
English | LJSpeech | MB-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | FullBand-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | WaveGrad | vocoder | v0.0.9 | 💾 |
v0.0.11
🐸 v0.0.11
🐞Bug Fixes
- Fixed #374. (Thx for reporting @a-froghyar )
💾 Code updates
-
/bin/resample.py
to resample wavefiles (:crown: @WeberJulian) - Some updates for Windows compat. (:crown: @GuyPaddock)
- Fixing
CheckSpectrogram
notebook. (:crown: @GuyPaddock) - Fix #392
🚶♀️ Operational Updates
🏅 Model implementations
- initial AlignTTS implementation. (#398)
- initial HiFiGAN implementation (:crown: @rishikksh20) (postponed to the next release)
🚀 New Pre-Trained Model Releases
- German - Tacotron2-DCA trained with thorsten_dataset. (:crown: @thorstenMueller )
- German - Wavegrad vocoder with thorsten_dataset. (:crown: @thorstenMueller)
Released Models
💡 All the models below are available by tts
end point as explained here.
Language | Dataset | Model Name | Model Type | TTS version | Download |
---|---|---|---|---|---|
✨ German | Thorsten-DE | Tacotron-DCA | tts | 😃 v0.0.11 | 💾 |
✨ German | Thorsten-DE | Wavegrad | vocoder | 😃 v0.0.11 | 💾 |
English | LJSpeech | SpeedySpeech | tts | v0.0.10 | 💾 |
English | EK1 | Tacotron2 | tts | v0.0.10 | 💾 |
Dutch | MAI | TacotronDDC | tts | v0.0.10 | 💾 |
Chinese | Baker | TacotronDDC-GST | tts | v0.0.10 | 💾 |
English | LJSpeech | TacotronDCA | tts | v0.0.9 | 💾 |
English | LJSpeech | Glow-TTS | tts | v0.0.9 | 💾 |
Spanish | M-AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
French | M_AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
Dutch | MAI | TacotronDDC | tts | v0.0.10 | 💾 |
English | EK1 | WaveGrad | vocoder | v0.0.10 | 💾 |
Dutch | MAI | ParallelWaveGAN | vocoder | v0.0.10 | 💾 |
English | LJSpeech | MB-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | FullBand-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | WaveGrad | vocoder | v0.0.9 | 💾 |
v0.0.10
🐸 v0.0.10
🐞Bug Fixes
- Make
synthesizer.py
saving the output audio with the vocoder sampling rate. It is necessary if there is sampling rates of the tts and the vocoder models are different and interpolation is applied to the tts model output before running the vocoder. Practically, it fixes generated Spanish and French voices bytts
ortts-server
on the terminal. - Handling utf-8 on Windows. (by @adonispujols)
- Fix Loading the last model when
--continue_training
. It was loading the best_model regardless.
💾 Code updates
- Breaking Change: Update default set of characters in
symbols.py
. This might require you to set your character set inconfig.json
if you like to use this version with your models trained with the previous version. - Chinese backend for text processing (#654 by @kirianguiller)
- Enable torch.hub integration for the released models.
- First github release.
- dep. version fixes. Using numpy > 1.17.5 breaks some tests.
- WaveRNN fix (by @gerazov )
- Big refactoring for the training scripts to share the init part of the code. (by @gerazov)
- Enable ModelManager to download models from Github releases.
- Add a test for
compute_statistics.py
- light-touch updates in
tts
andtts-server
entry points. (thanks @thorstenMueller ) - Define default vocoder models for each tts model in
.models.json
.tts
andtts-server
entry points use the default vocoder if the user does not specify. -
find_unique_chars.py
to find all the unique characters in a dataset. - A better way to handling best models through training. (thx @gerazov )
- pass used characters to the model config.json at the beginning of the training. This prevents any code update later to affect the trained models.
- Migration to Github Actions for CI.
- Deprecate wheel based use of tts-server for the sake of the new design.
- 🐸
🚶♀️ Operational Updates
- Move released models to Github Releases and deprecate GDrive being the first option.
🏅 Model implementations
- No updates 😓
🚀 New Pre-Trained Model Releases
- English ek1 - Tacotron2 model and WaveGrad vocoder under
.models.json
. (huge THX!! to @nmstoker) - Russian Ruslan - Tacotron2-DDC model.
- Dutch model. (huge THX!! to @r-dh )
- Chinese Tacotron2 model. (huge THX!! to @kirianguiller)
- English LJSpeech - SpeechSpeech with WaveNet decoder.
Released Models
💡 All the models below are available by tts
end point as explained here.
Language | Dataset | Model Name | Model Type | TTS version | Download |
---|---|---|---|---|---|
English | LJSpeech | SpeedySpeech | tts | 😃 v0.0.10 | 💾 |
English | EK1 | Tacotron2 | tts | 😃 v0.0.10 | 💾 |
Dutch | MAI | TacotronDDC | tts | 😃 v0.0.10 | 💾 |
Chinese | Baker | TacotronDDC-GST | tts | 😃 v0.0.10 | 💾 |
English | LJSpeech | TacotronDCA | tts | v0.0.9 | 💾 |
English | LJSpeech | Glow-TTS | tts | v0.0.9 | 💾 |
Spanish | M-AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
French | M_AILabs | TacotronDDC | tts | v0.0.9 | 💾 |
Dutch | MAI | TacotronDDC | tts | 😃 v0.0.10 | 💾 |
English | EK1 | WaveGrad | vocoder | 😃 v0.0.10 | 💾 |
Dutch | MAI | ParallelWaveGAN | vocoder | 😃 v0.0.10 | 💾 |
English | LJSpeech | MB-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | FullBand-MelGAN | vocoder | v0.0.9 | 💾 |
🌍 Multi-Lang | LibriTTS | WaveGrad | vocoder | v0.0.9 | 💾 |
v0.0.9
🐸 TTS v0.0.9 - the first release 🎉
This is the first and v0.0.9 release of 🐸TTS.
🐸TTS is still an evolving project and any upcoming release might be significantly different and not backward compatible.
In this release, we provide the following models.
Language | Dataset | Model Name | Model Type | Download |
---|---|---|---|---|
English | LJSpeech | TacotronDCA | tts | 💾 |
English | LJSpeech | Glow-TTS | tts | 💾 |
Spanish | M-AILabs | TacotronDDC | tts | 💾 |
French | M_AILabs | TacotronDDC | tts | 💾 |
English | LJSpeech | MB-MelGAN | vocoder | 💾 |
🌍 Multi-Lang | LibriTTS | FullBand-MelGAN | vocoder | 💾 |
🌍 Multi-Lang | LibriTTS | WaveGrad | vocoder | 💾 |
Notes
- Multi-Lang vocoder models are intended for non-English models.
- Vocoder models are independently trained from the tts models with possibly different sampling rates. Therefore, the performance is not optimal.
- All models are trained with phonemes generated by espeak back-end (not espeak-ng).
- This release has been tested under Python 3.6, 3.7, and 3.8. It is strongly suggested to use
conda
to install the dependencies and set up the environment.
Edit:
(22.03.2021) - Fullband Universal Vocoder is corrected with the right model files. Previously, we released the wrong model with that name.