The repository is the official QPNet [1, 2] implementation with Pytorch.
The generated samples can be found on our Demo page.
The repository includes two parts:
- Acoustic feature extraction
to extract spectral and prosodic features by WORLD - QPNet vocoder (SI: speaker-independent; SD: speaker-dependent)
to generate speech based on the input acoustic features
This repository is tested on
- Python 3.6
- Cuda 10.0
- Pytorch 1.3
- torchvision 0.4.1
The code works with both anaconda and virtualenv.
The following example uses anaconda.
$ conda create -n venvQPNet python=3.6
$ source activate venvQPNet
$ pip install sprocket-vc
$ pip install torch torchvision
$ git clone https://github.com/bigpon/QPNet.git
- corpus
the folder to put corpora
-- each corpus subfolder includes a scp subfolder for file lists and a wav subfolder for speech files - qpnet_models
the folder for trained models - qpnet_output
the folder for decoding output files - src
the folder for source code
- Dowdlod the Voice Conversion Challenge 2018 (VCC2018) corpus to run the QPNet example
$ cd QPNet/corpus/VCC2018/wav/
$ wget -o train.log -O train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip
$ wget -o eval.log -O eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip
$ wget -o ref.log -O ref.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip
$ unzip train.zip
$ unzip eval.zip
$ unzip ref.zip
- SI-QPNet training set:
corpus/VCC2018/scp/vcc18tr.scp
- SD-QPNet updating set:
corpus/VCC2018/scp/vcc18up_VCC2SPK.scp
- SD-QPNet validation set:
corpus/VCC2018/scp/vcc18va_VCC2SPK.scp
- Testing set:
corpus/VCC2018/scp/vcc18eval.scp
- Modify the corresponding CUDA and project root paths in
src/utils/param_path.py
# move to the source code folder to run the following scripts
$ cd QPNet/src/
- Output the F0 and power distributions histogram figures to
corpus/VCC2018/hist/
$ bash run_FE.sh --stage 0
-
Modify the f0_min (lower bound of F0 range), f0_max (upper bound of F0 range), and pow_th (power threshold for VAD) values of the speakers in
corpus/VCC2018/conf/pow_f0_dict.yml
*The F0 ranges setting details can be found here. -
Extract and save acoustic features of the training, evaluation, and reference sets in
corpus/VCC2018/h5/
*The analysis-synthesis speech files of the training set are also saved incorpus/VCC2018/h5_restored/
.
$ bash run_FE.sh --stage 123
- Process waveform files by noise shaping for QPNet training and save the shaped files in
corpus/VCC2018/wav_h5_ns/
$ bash run_FE.sh -stage 4
- Train and test SI-QPNet
# the gpu ID can be set by --gpu GPU_ID (default: 0)
$ bash run_QP.sh --gpu 0 --stage 03
- Update SD-QPNet for each speaker with the corresponding partial training data
$ bash run_QP.sh --gpu 0 --stage 1
- Validate SD-QPNet for each speaker with the corresponding partial training data
# the validation results are in `qpnet_models/modelname/validation_result.yml`
$ bash run_QP.sh --gpu 0 --stage 2
- Test SD-QPNet with the updating iteration number according to the validation results
# the iter number can be set by --miter NUM (default: 1000)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 4
- Test SI-QPNet with scaled F0 (0.5 * F0 and 1.5 * F0)
# default F0 scaled factors=("0.50" "1.50")
# the scaled factors can be changed in run_QP.sh
$ bash run_QP.sh --gpu 0 --stage 5
- Test SD-QPNet with scaled F0 (0.5 * F0 and 1.5 * F0)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 6
-
The program only support WORLD acoustic features now, but you can modify the feature extraction script and change the 'feature_type' in
src/runFE.py
andsrc/runQP.py
for new features. -
You can extract acoustic feature with different settings (ex: frame length ...) and set different 'feature_format' (default: h5) in
src/runFE.py
andsrc/runQP.py
for each setting, and the program will create the corresponding folders. -
You can easily change the generation model by setting different 'network' (default: qpnet) in
src/runQP.py
when you create new generation models. -
When working with new corpus, You only need to create the file lists of wav files because the program will create feature list based on the wav file list.
-
When you create the wav file lists, please follow the form as the example
(ex: rootpath/wav/xxx/xxx.wav).
- The pre-trained models and generated utterances are released.
- You can download all pre-trained models via the link.
- Please put the downloaded models in the
qpnet_models
folder. - The SD (speaker-dependent) models are adapted from the SI (speaker-independent) model.
- You can download all generated utterances via the link.
- The released models are only trained with the vcc18 corpus (~ 1 hr).
- To achieve higher speech qualities, more training data is required. (In our papers, the training data was ~ 3 hrs)
Corpus | Language | Fs [Hz] | Feature | Model | Result |
---|---|---|---|---|---|
vcc18 | EN | 22050 | world (uv + f0 + mcep + ap) (shiftms: 5) |
SI | link |
SD_VCC2SF3 | link | ||||
SD_VCC2SF4 | link | ||||
SD_VCC2SM3 | link | ||||
SD_VCC2SM4 | link |
The QPNet repository is developed based on
- Pytorch WaveNet implementation by @kan-bayashi
- Voice conversion implementation by @k2kobayashi
If you find the code is helpful, please cite the following papers.
@InProceedings{qpnet_2019,
author="Y.-C. Wu and T. Hayashi and P. L. Tobing and K. Kobayashi and T. Toda",
title="{Q}uasi-{P}eriodic {W}ave{N}et vocoder: a pitch dependent dilated convolution model for parametric speech generation",
booktitle="Proc. Interspeech",
year="2019",
month="Sept.",
pages="196-200"
}
@ARTICLE{qpnet_2021,
author="Y.-C. Wu and T. Hayashi and P. L. Tobing and K. Kobayashi and T. Toda",
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network},
year={2021},
volume={29},
pages={1134-1148},
doi={10.1109/TASLP.2021.3061245}}
Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: [email protected]
Advisor:
Tomoki Toda @ Nagoya University
E-mail: [email protected]