Thai TTS Tacotron is the text to speech model in Thai trained by Tacotron2.
Model from https://github.com/NVIDIA/tacotron2, the pytorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions and training with TSynC2_Nun Dataset.
Use the git clone to clone the tacotron project.
%tensorflow_version 1.x
import os
from os.path import exists, join, basename, splitext
git_repo_url = 'https://github.com/Prim9000/tacotron2.git'
project_name = splitext(basename(git_repo_url))[0]
if not exists(project_name):
# clone and install
!git clone -q --recursive {git_repo_url}
!cd {project_name}/waveglow && git checkout 9168aea
!pip install -q librosa unidecode
import sys
sys.path.append(join(project_name, 'waveglow/'))
sys.path.append(project_name)
import time
import matplotlib
import matplotlib.pylab as plt
plt.rcParams["axes.grid"] = False
Parameters tuning and recheck your training files:
train.py --output_directory=outdir --log_directory=logdir
- (OPTIONAL)
tensorboard --logdir=outdir/logdir
Training using a pre-trained model can lead to faster convergence
By default, the dataset dependent text embedding layers are ignored.
python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start
checkpoint_path = "checkpoint_path/checkpoint" # your model's checkpoint path
model = load_model(hparams)
model.load_state_dict(torch.load(checkpoint_path)['state_dict'])
_ = model.cuda().eval().half()
text = "ยินดีที่ได้รู้จัก นี่คือเสียงจากปัญญาประดิษฐ์" #change input text here
sequence = np.array(text_to_sequence(text, ['english_cleaners']))[None, :]
sequence = torch.autograd.Variable(
torch.from_numpy(sequence)).cuda().long()
mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
plot_data((mel_outputs.float().data.cpu().numpy()[0],
mel_outputs_postnet.float().data.cpu().numpy()[0],
alignments.float().data.cpu().numpy()[0].T))
with torch.no_grad():
audio = waveglow.infer(mel_outputs_postnet, sigma=0.666)
ipd.Audio(audio[0].data.cpu().numpy(), rate=hparams.sampling_rate)
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis
nv-wavenet Faster than real time WaveNet.
Tacotron 2, the official repository implementation with Pytorch.
Lifelike Speech Synthesis | Thai Text To Speech with Tacotron2