Skip to content

Latest commit

 

History

History
113 lines (81 loc) · 2.91 KB

README.md

File metadata and controls

113 lines (81 loc) · 2.91 KB

[Interspeech'24] GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Zehua Kcriss Li1 · Meiying Melissa Chen1 · Yi Zhong1 · Pinxin Liu2 · Zhiyao Duan1 ·
1Department of Electrical and Computer Engineering, University of Rochester;
2Department of Computer Science, University of Rochester

Paper Arxiv     Demo   

GTR-Voice

🗒 TODOs

  • Release the Dataset.
  • Build the Github Page.
  • Release pretrained weights.
  • Release training code.

⚒️ Environment

We recommend a python version >=3.9 and cuda version =11.8. It's possible to have other compatible version.

conda create -n gtr python=3.10
conda activate gtr
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Now, get started with the following code:

cd inference
CUDA_VISIBLE_DEVICES=0 python inference.py --wav_file ./assets/001.wav --init_frame ./assets/001.png

🔥 Train Your Own Model

Train StyleTTS

Change into the styletts folder:

cd ../styletts

Run the following code to train StyleTTS stage 1

Run the following code to train StyleTTS stage 2

Train PastPitch

Change into the image-warping folder:

cd fastpitch

Then run the following code to train fast-pitch:

🙏 Acknowledgments

Our code follows several excellent repositories. We appreciate them for making their codes available to the public.

✏️ Citation

If you find our work useful, please consider citing:

@inproceedings{li2024gtr,
  title={GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis},
  author={Li, Zehua Kcriss and Chen, Meiying Melissa and Zhong, Yi and Liu, Pinxin and Duan, Zhiyao},
  booktitle={Proc. Interspeech 2024},
  pages={1775--1779},
  year={2024}
}