This is the PyTorch implementation of FastLTS (ACM MM'22), a non-autoregressive end-to-end model for unconstrained lip-to-speech synthesis.
The correctness of this open-source version is still under validation. Feel free to create an issue if you find any problems.
Speaker | Checkpoint |
---|---|
Chemistry Lectures | Google Drive |
Chess Analysis | Google Drive |
Hardware Security | Google Drive |
- python >= 3.6
- pytorch >= 1.7.0
- numpy
- scipy
- pillow
- inflect
- librosa
- Unidecode
- matplotlib
- tensorboardX
- ffmpeg
sudo apt-get install ffmpeg
We adopt the same data format as Lip2Wav. Please download the datasets and following its preprocessing method in the Lip2Wav repository.
Suppose we use the chess
split in the Lip2Wav dataset. Use the following command for the first stage training.
python train_stage1.py -d <DATA_DIR>/chess -l <LOG_DIR> -cd <CKPT_DIR>
An additional -cp
argument can be used to restore training from a checkpoint.
When observing the convergence of the first-stage model, load its checkpoint for the second-stage training with the command:
python train_stage2.py -d <DATA_DIR>/chess -l <LOG_DIR> -cd <CKPT_DIR> -fr <PATH_TO_STAGE1_CKPT>
An additional -cp
argument can be used to restore training from a checkpoint, which should not be used with -fr
together. Besides, we add distributed data parallel in training of stage2, with can be turned on with --ddp
and --ngpu <GPU_NUM>
.
python test.py -d <DATA_DIR>/chess -cp <PATH_TO_STAGE2_CKPT> -o <OUTPUT_DIR> -bs <BATCH_SIZE>
Part of the code is borrowed from the following repos. We would like to thank the authors of these repos for their contribution.
- Lip2Wav-pytorch: https://github.com/joannahong/Lip2Wav-pytorch
- NATSpeech: https://github.com/NATSpeech/NATSpeech
- Hifi-GAN: https://github.com/jik876/hifi-gan
If you find this code useful in your research, please cite our work:
@inproceedings{wang2022fastlts,
title={Fastlts: Non-autoregressive end-to-end unconstrained lip-to-speech synthesis},
author={Wang, Yongqi and Zhao, Zhou},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={5678--5687},
year={2022}
}