Skip to content

leisongju/unidubbing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation

Setting Up the Environment

conda create -n unidubbing python=3.8
git clone https://github.com/facebookresearch/fairseq/tree/afc77bdf4bb51453ce76f1572ef2ee6ddcda8eeb
cd fairseq
pip install --editable ./
pip install -r requirements.txt

Zero-Shot

1. Prepare a pretrained Hubert and HifiGAN (Acoustic unit)

Model Pretraining Data Model Quantizer
mHuBERT Base En, Es, Fr speech download L11 km1000
HIFIGAN 16k Universal download
dict.unit.txt download

Unit-to-Speech HiFi-GAN vocoder

Unit config Unit size Vocoder language Dataset Model
mHuBERT, layer 11 1000 En LJSpeech ckpt, config
mHuBERT, layer 11 1000 Es CSS10 ckpt, config
mHuBERT, layer 11 1000 Fr CSS10 ckpt, config
python examples/speech_to_speech/generate_waveform_from_code.py \
  --in-code-file ${RESULTS_PATH}/generate-${GEN_SUBSET}.unit \
  --vocoder $VOCODER_CKPT --vocoder-cfg $VOCODER_CFG \
  --results-path ${RESULTS_PATH} --dur-prediction

Full-Shot

2.Prepare Semantic unit

Encode the audio using the pre-trained weights HiFi-Codec-16k-320d from AcademiCodec, and then decode the audio with the same model.

Reference Repository

Av-Hubert fairseq syncnet Wav2Lip speech-resynthesis AcademiCodec Transpeech

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages