Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation

Setting Up the Environment

conda create -n unidubbing python=3.8
git clone https://github.com/facebookresearch/fairseq/tree/afc77bdf4bb51453ce76f1572ef2ee6ddcda8eeb
cd fairseq
pip install --editable ./
pip install -r requirements.txt

Zero-Shot

1. Prepare a pretrained Hubert and HifiGAN (Acoustic unit)

Model	Pretraining Data	Model	Quantizer
mHuBERT Base	En, Es, Fr speech	download	L11 km1000
HIFIGAN	16k Universal	download
dict.unit.txt		download

Unit-to-Speech HiFi-GAN vocoder

Unit config	Unit size	Vocoder language	Dataset	Model
mHuBERT, layer 11	1000	En	LJSpeech	ckpt, config
mHuBERT, layer 11	1000	Es	CSS10	ckpt, config
mHuBERT, layer 11	1000	Fr	CSS10	ckpt, config

python examples/speech_to_speech/generate_waveform_from_code.py \
  --in-code-file ${RESULTS_PATH}/generate-${GEN_SUBSET}.unit \
  --vocoder $VOCODER_CKPT --vocoder-cfg $VOCODER_CFG \
  --results-path ${RESULTS_PATH} --dur-prediction

Full-Shot

2.Prepare Semantic unit

Encode the audio using the pre-trained weights HiFi-Codec-16k-320d from AcademiCodec, and then decode the audio with the same model.

Reference Repository

Av-Hubert fairseq syncnet Wav2Lip speech-resynthesis AcademiCodec Transpeech

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
avhubert		avhubert
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation

Setting Up the Environment

Zero-Shot

1. Prepare a pretrained Hubert and HifiGAN (Acoustic unit)

Unit-to-Speech HiFi-GAN vocoder

Full-Shot

2.Prepare Semantic unit

Reference Repository

About

Releases

Packages

Languages

leisongju/unidubbing

Folders and files

Latest commit

History

Repository files navigation

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation

Setting Up the Environment

Zero-Shot

1. Prepare a pretrained Hubert and HifiGAN (Acoustic unit)

Unit-to-Speech HiFi-GAN vocoder

Full-Shot

2.Prepare Semantic unit

Reference Repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages