[Interspeech'24] GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Zehua Kcriss Li1
·
Meiying Melissa Chen1
·
Yi Zhong1
·
Pinxin Liu2
·
Zhiyao Duan1
·
1Department of Electrical and Computer Engineering, University of Rochester;
2Department of Computer Science, University of Rochester
- Release the Dataset.
- Build the Github Page.
- Release pretrained weights.
- Release training code.
We recommend a python version >=3.9
and cuda version =11.8
. It's possible to have other compatible version.
conda create -n gtr python=3.10
conda activate gtr
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
Now, get started with the following code:
cd inference
CUDA_VISIBLE_DEVICES=0 python inference.py --wav_file ./assets/001.wav --init_frame ./assets/001.png
Change into the styletts
folder:
cd ../styletts
Run the following code to train StyleTTS stage 1
Run the following code to train StyleTTS stage 2
Change into the image-warping
folder:
cd fastpitch
Then run the following code to train fast-pitch:
Our code follows several excellent repositories. We appreciate them for making their codes available to the public.
If you find our work useful, please consider citing:
@inproceedings{li2024gtr,
title={GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis},
author={Li, Zehua Kcriss and Chen, Meiying Melissa and Zhong, Yi and Liu, Pinxin and Duan, Zhiyao},
booktitle={Proc. Interspeech 2024},
pages={1775--1779},
year={2024}
}