conda create -n unidubbing python=3.8
git clone https://github.com/facebookresearch/fairseq/tree/afc77bdf4bb51453ce76f1572ef2ee6ddcda8eeb
cd fairseq
pip install --editable ./
pip install -r requirements.txt
Model | Pretraining Data | Model | Quantizer |
---|---|---|---|
mHuBERT Base | En, Es, Fr speech | download | L11 km1000 |
HIFIGAN | 16k Universal | download | |
dict.unit.txt | download |
Unit config | Unit size | Vocoder language | Dataset | Model |
---|---|---|---|---|
mHuBERT, layer 11 | 1000 | En | LJSpeech | ckpt, config |
mHuBERT, layer 11 | 1000 | Es | CSS10 | ckpt, config |
mHuBERT, layer 11 | 1000 | Fr | CSS10 | ckpt, config |
python examples/speech_to_speech/generate_waveform_from_code.py \
--in-code-file ${RESULTS_PATH}/generate-${GEN_SUBSET}.unit \
--vocoder $VOCODER_CKPT --vocoder-cfg $VOCODER_CFG \
--results-path ${RESULTS_PATH} --dur-prediction
Encode the audio using the pre-trained weights HiFi-Codec-16k-320d from AcademiCodec, and then decode the audio with the same model.
Av-Hubert fairseq syncnet Wav2Lip speech-resynthesis AcademiCodec Transpeech