FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

Yongqi Wang, Zhou Zhao | Zhejiang University

This is the PyTorch implementation of FastLTS (ACM MM'22), a non-autoregressive end-to-end model for unconstrained lip-to-speech synthesis.

Note

The correctness of this open-source version is still under validation. Feel free to create an issue if you find any problems.

Checkpoints

Speaker	Checkpoint
Chemistry Lectures	Google Drive
Chess Analysis	Google Drive
Hardware Security	Google Drive

Dependencies

python >= 3.6
pytorch >= 1.7.0
numpy
scipy
pillow
inflect
librosa
Unidecode
matplotlib
tensorboardX
ffmpeg sudo apt-get install ffmpeg

Data pre-processing

We adopt the same data format as Lip2Wav. Please download the datasets and following its preprocessing method in the Lip2Wav repository.

Training

First stage

Suppose we use the chess split in the Lip2Wav dataset. Use the following command for the first stage training.

python train_stage1.py -d <DATA_DIR>/chess -l <LOG_DIR> -cd <CKPT_DIR>

An additional -cp argument can be used to restore training from a checkpoint.

Second stage

When observing the convergence of the first-stage model, load its checkpoint for the second-stage training with the command:

python train_stage2.py -d <DATA_DIR>/chess -l <LOG_DIR> -cd <CKPT_DIR> -fr <PATH_TO_STAGE1_CKPT>

An additional -cp argument can be used to restore training from a checkpoint, which should not be used with -fr together. Besides, we add distributed data parallel in training of stage2, with can be turned on with --ddp and --ngpu <GPU_NUM>.

Inference

python test.py -d <DATA_DIR>/chess -cp <PATH_TO_STAGE2_CKPT> -o <OUTPUT_DIR> -bs <BATCH_SIZE>

Acknowledgements

Part of the code is borrowed from the following repos. We would like to thank the authors of these repos for their contribution.

Lip2Wav-pytorch: https://github.com/joannahong/Lip2Wav-pytorch
NATSpeech: https://github.com/NATSpeech/NATSpeech
Hifi-GAN: https://github.com/jik876/hifi-gan

Citations

If you find this code useful in your research, please cite our work:

@inproceedings{wang2022fastlts,
  title={Fastlts: Non-autoregressive end-to-end unconstrained lip-to-speech synthesis},
  author={Wang, Yongqi and Zhao, Zhou},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={5678--5687},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
hparams.py		hparams.py
test.py		test.py
train_stage1.py		train_stage1.py
train_stage2.py		train_stage2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

Yongqi Wang, Zhou Zhao | Zhejiang University

Note

Checkpoints

Dependencies

Data pre-processing

Training

First stage

Second stage

Inference

Acknowledgements

Citations

About

Releases

Packages

Languages

License

cyanbx/FastLTS

Folders and files

Latest commit

History

Repository files navigation

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

Yongqi Wang, Zhou Zhao | Zhejiang University

Note

Checkpoints

Dependencies

Data pre-processing

Training

First stage

Second stage

Inference

Acknowledgements

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages