[Interspeech'24] GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Zehua Kcriss Li¹ · Meiying Melissa Chen¹ · Yi Zhong¹ · Pinxin Liu² · Zhiyao Duan¹ ·
¹Department of Electrical and Computer Engineering, University of Rochester;
²Department of Computer Science, University of Rochester

🗒 TODOs

Release the Dataset.
Build the Github Page.
Release pretrained weights.
Release training code.

⚒️ Environment

We recommend a python version >=3.9 and cuda version =11.8. It's possible to have other compatible version.

conda create -n gtr python=3.10
conda activate gtr
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Now, get started with the following code:

cd inference
CUDA_VISIBLE_DEVICES=0 python inference.py --wav_file ./assets/001.wav --init_frame ./assets/001.png

🔥 Train Your Own Model

Train StyleTTS

Change into the styletts folder:

cd ../styletts

Run the following code to train StyleTTS stage 1

Run the following code to train StyleTTS stage 2

Train PastPitch

Change into the image-warping folder:

cd fastpitch

Then run the following code to train fast-pitch:

🙏 Acknowledgments

Our code follows several excellent repositories. We appreciate them for making their codes available to the public.

StyleTTS
FastPitch

✏️ Citation

If you find our work useful, please consider citing:

@inproceedings{li2024gtr,
  title={GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis},
  author={Li, Zehua Kcriss and Chen, Meiying Melissa and Zhong, Yi and Liu, Pinxin and Duan, Zhiyao},
  booktitle={Proc. Interspeech 2024},
  pages={1775--1779},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

[Interspeech'24] GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

🗒 TODOs

⚒️ Environment

🔥 Train Your Own Model

Train StyleTTS

Train PastPitch

🙏 Acknowledgments

✏️ Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

[Interspeech'24] GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

🗒 TODOs

⚒️ Environment

🔥 Train Your Own Model

Train StyleTTS

Train PastPitch

🙏 Acknowledgments

✏️ Citation