The source code for the paper XiaoiceSing2 (interspeech2023)
I am busy with job-hunting now. I will update other modules, including the HiFi-WaveGAN after my final decision.
- fastspeech2-based generator
- discriminator group, including segment discriminators and detail discriminators
- ConvFFT block
Kaldi style preparation
- wav.scp
- utt2spk
- spk2utt
- text
./run.sh --start-stage 1 --stop-stage 1 # extract melspectrogram, f0, energy, and statistical value
./run.sh --start-stage 2 --stop-stage 2
Real(left) XiaoiceSing(middle) XiaoiceSing2(right)
L2 loss before post-processing(left) L2 loss after post-processing(right)
./run.sh --start-stage 3 --stop-stage 3