Skip to content

bigpon/QPNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Version

Quasi-Periodic WaveNet (QPNet)

Introduction

The repository is the official QPNet [1, 2] implementation with Pytorch.

The generated samples can be found on our Demo page.

The repository includes two parts:

  1. Acoustic feature extraction
    to extract spectral and prosodic features by WORLD
  2. QPNet vocoder (SI: speaker-independent; SD: speaker-dependent)
    to generate speech based on the input acoustic features

Requirements

This repository is tested on

  • Python 3.6
  • Cuda 10.0
  • Pytorch 1.3
  • torchvision 0.4.1

Setup

The code works with both anaconda and virtualenv.
The following example uses anaconda.

$ conda create -n venvQPNet python=3.6
$ source activate venvQPNet
$ pip install sprocket-vc
$ pip install torch torchvision
$ git clone https://github.com/bigpon/QPNet.git

Folder architecture

  • corpus
    the folder to put corpora
    -- each corpus subfolder includes a scp subfolder for file lists and a wav subfolder for speech files
  • qpnet_models
    the folder for trained models
  • qpnet_output
    the folder for decoding output files
  • src
    the folder for source code

Example

Corpus download:

$ cd QPNet/corpus/VCC2018/wav/

$ wget -o train.log -O train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip

$ wget -o eval.log -O eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip

$ wget -o ref.log -O ref.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip

$ unzip train.zip
$ unzip eval.zip
$ unzip ref.zip
  • SI-QPNet training set: corpus/VCC2018/scp/vcc18tr.scp
  • SD-QPNet updating set: corpus/VCC2018/scp/vcc18up_VCC2SPK.scp
  • SD-QPNet validation set: corpus/VCC2018/scp/vcc18va_VCC2SPK.scp
  • Testing set: corpus/VCC2018/scp/vcc18eval.scp

Path setup:

  • Modify the corresponding CUDA and project root paths in src/utils/param_path.py
# move to the source code folder to run the following scripts
$ cd QPNet/src/

Feature extraction:

  1. Output the F0 and power distributions histogram figures to corpus/VCC2018/hist/
$ bash run_FE.sh --stage 0
  1. Modify the f0_min (lower bound of F0 range), f0_max (upper bound of F0 range), and pow_th (power threshold for VAD) values of the speakers in corpus/VCC2018/conf/pow_f0_dict.yml
    *The F0 ranges setting details can be found here.

  2. Extract and save acoustic features of the training, evaluation, and reference sets in corpus/VCC2018/h5/
    *The analysis-synthesis speech files of the training set are also saved in corpus/VCC2018/h5_restored/.

$ bash run_FE.sh --stage 123
  1. Process waveform files by noise shaping for QPNet training and save the shaped files in corpus/VCC2018/wav_h5_ns/
$ bash run_FE.sh -stage 4 

QPNet vocoder:

  1. Train and test SI-QPNet
# the gpu ID can be set by --gpu GPU_ID (default: 0)
$ bash run_QP.sh --gpu 0 --stage 03
  1. Update SD-QPNet for each speaker with the corresponding partial training data
$ bash run_QP.sh --gpu 0 --stage 1
  1. Validate SD-QPNet for each speaker with the corresponding partial training data
# the validation results are in `qpnet_models/modelname/validation_result.yml`
$ bash run_QP.sh --gpu 0 --stage 2
  1. Test SD-QPNet with the updating iteration number according to the validation results
# the iter number can be set by --miter NUM (default: 1000)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 4
  1. Test SI-QPNet with scaled F0 (0.5 * F0 and 1.5 * F0)
# default F0 scaled factors=("0.50" "1.50")
# the scaled factors can be changed in run_QP.sh
$ bash run_QP.sh --gpu 0 --stage 5
  1. Test SD-QPNet with scaled F0 (0.5 * F0 and 1.5 * F0)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 6

Hints

  • The program only support WORLD acoustic features now, but you can modify the feature extraction script and change the 'feature_type' in src/runFE.py and src/runQP.py for new features.

  • You can extract acoustic feature with different settings (ex: frame length ...) and set different 'feature_format' (default: h5) in src/runFE.py and src/runQP.py for each setting, and the program will create the corresponding folders.

  • You can easily change the generation model by setting different 'network' (default: qpnet) in src/runQP.py when you create new generation models.

  • When working with new corpus, You only need to create the file lists of wav files because the program will create feature list based on the wav file list.

  • When you create the wav file lists, please follow the form as the example
    (ex: rootpath/wav/xxx/xxx.wav).

Models and results

  • The pre-trained models and generated utterances are released.
  • You can download all pre-trained models via the link.
  • Please put the downloaded models in the qpnet_models folder.
  • The SD (speaker-dependent) models are adapted from the SI (speaker-independent) model.
  • You can download all generated utterances via the link.
  • The released models are only trained with the vcc18 corpus (~ 1 hr).
  • To achieve higher speech qualities, more training data is required. (In our papers, the training data was ~ 3 hrs)
Corpus Language Fs [Hz] Feature Model Result
vcc18 EN 22050 world
(uv + f0 + mcep + ap)
(shiftms: 5)
SI link
SD_VCC2SF3 link
SD_VCC2SF4 link
SD_VCC2SM3 link
SD_VCC2SM4 link

References

The QPNet repository is developed based on

Citation

If you find the code is helpful, please cite the following papers.

@InProceedings{qpnet_2019,
author="Y.-C. Wu and T. Hayashi and P. L. Tobing and K. Kobayashi and T. Toda",
title="{Q}uasi-{P}eriodic {W}ave{N}et vocoder: a pitch dependent dilated convolution model for parametric speech generation",
booktitle="Proc. Interspeech",
year="2019",
month="Sept.",
pages="196-200"
}

@ARTICLE{qpnet_2021,
author="Y.-C. Wu and T. Hayashi and P. L. Tobing and K. Kobayashi and T. Toda",
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
title={Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network}, 
year={2021},
volume={29},
pages={1134-1148},
doi={10.1109/TASLP.2021.3061245}}

Authors

Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: [email protected]

Advisor:
Tomoki Toda @ Nagoya University
E-mail: [email protected]