Quasi-Periodic WaveNet (QPNet)

Introduction

The repository is the official QPNet [1, 2] implementation with Pytorch.

The generated samples can be found on our Demo page.

The repository includes two parts:

Acoustic feature extraction
to extract spectral and prosodic features by WORLD
QPNet vocoder (SI: speaker-independent; SD: speaker-dependent)
to generate speech based on the input acoustic features

Requirements

This repository is tested on

Python 3.6
Cuda 10.0
Pytorch 1.3
torchvision 0.4.1

Setup

The code works with both anaconda and virtualenv.
The following example uses anaconda.

$ conda create -n venvQPNet python=3.6
$ source activate venvQPNet
$ pip install sprocket-vc
$ pip install torch torchvision
$ git clone https://github.com/bigpon/QPNet.git

Folder architecture

corpus
the folder to put corpora
-- each corpus subfolder includes a scp subfolder for file lists and a wav subfolder for speech files
qpnet_models
the folder for trained models
qpnet_output
the folder for decoding output files
src
the folder for source code

Example

Corpus download:

Dowdlod the Voice Conversion Challenge 2018 (VCC2018) corpus to run the QPNet example

$ cd QPNet/corpus/VCC2018/wav/

$ wget -o train.log -O train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip

$ wget -o eval.log -O eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip

$ wget -o ref.log -O ref.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip

$ unzip train.zip
$ unzip eval.zip
$ unzip ref.zip

SI-QPNet training set: corpus/VCC2018/scp/vcc18tr.scp
SD-QPNet updating set: corpus/VCC2018/scp/vcc18up_VCC2SPK.scp
SD-QPNet validation set: corpus/VCC2018/scp/vcc18va_VCC2SPK.scp
Testing set: corpus/VCC2018/scp/vcc18eval.scp

Path setup:

Modify the corresponding CUDA and project root paths in src/utils/param_path.py

# move to the source code folder to run the following scripts
$ cd QPNet/src/

Feature extraction:

Output the F0 and power distributions histogram figures to corpus/VCC2018/hist/

$ bash run_FE.sh --stage 0

Modify the f0_min (lower bound of F0 range), f0_max (upper bound of F0 range), and pow_th (power threshold for VAD) values of the speakers in corpus/VCC2018/conf/pow_f0_dict.yml
*The F0 ranges setting details can be found here.
Extract and save acoustic features of the training, evaluation, and reference sets in corpus/VCC2018/h5/
*The analysis-synthesis speech files of the training set are also saved in corpus/VCC2018/h5_restored/.

$ bash run_FE.sh --stage 123

Process waveform files by noise shaping for QPNet training and save the shaped files in corpus/VCC2018/wav_h5_ns/

$ bash run_FE.sh -stage 4

QPNet vocoder:

Train and test SI-QPNet

# the gpu ID can be set by --gpu GPU_ID (default: 0)
$ bash run_QP.sh --gpu 0 --stage 03

Update SD-QPNet for each speaker with the corresponding partial training data

$ bash run_QP.sh --gpu 0 --stage 1

Validate SD-QPNet for each speaker with the corresponding partial training data

# the validation results are in `qpnet_models/modelname/validation_result.yml`
$ bash run_QP.sh --gpu 0 --stage 2

Test SD-QPNet with the updating iteration number according to the validation results

# the iter number can be set by --miter NUM (default: 1000)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 4

Test SI-QPNet with scaled F0 (0.5 * F0 and 1.5 * F0)

# default F0 scaled factors=("0.50" "1.50")
# the scaled factors can be changed in run_QP.sh
$ bash run_QP.sh --gpu 0 --stage 5

Test SD-QPNet with scaled F0 (0.5 * F0 and 1.5 * F0)

$ bash run_QP.sh --gpu 0 --miter 1000 --stage 6

Hints

The program only support WORLD acoustic features now, but you can modify the feature extraction script and change the 'feature_type' in src/runFE.py and src/runQP.py for new features.
You can extract acoustic feature with different settings (ex: frame length ...) and set different 'feature_format' (default: h5) in src/runFE.py and src/runQP.py for each setting, and the program will create the corresponding folders.
You can easily change the generation model by setting different 'network' (default: qpnet) in src/runQP.py when you create new generation models.
When working with new corpus, You only need to create the file lists of wav files because the program will create feature list based on the wav file list.
When you create the wav file lists, please follow the form as the example
(ex: rootpath/wav/xxx/xxx.wav).

Models and results

The pre-trained models and generated utterances are released.
You can download all pre-trained models via the link.
Please put the downloaded models in the qpnet_models folder.
The SD (speaker-dependent) models are adapted from the SI (speaker-independent) model.
You can download all generated utterances via the link.
The released models are only trained with the vcc18 corpus (~ 1 hr).
To achieve higher speech qualities, more training data is required. (In our papers, the training data was ~ 3 hrs)

Corpus	Language	Fs [Hz]	Feature	Model	Result
vcc18	EN	22050	world (uv + f0 + mcep + ap) (shiftms: 5)	SI	link
				SD_VCC2SF3	link
				SD_VCC2SF4	link
				SD_VCC2SM3	link
				SD_VCC2SM4	link

References

The QPNet repository is developed based on

Pytorch WaveNet implementation by @kan-bayashi
Voice conversion implementation by @k2kobayashi

Citation

If you find the code is helpful, please cite the following papers.

@InProceedings{qpnet_2019,
author="Y.-C. Wu and T. Hayashi and P. L. Tobing and K. Kobayashi and T. Toda",
title="{Q}uasi-{P}eriodic {W}ave{N}et vocoder: a pitch dependent dilated convolution model for parametric speech generation",
booktitle="Proc. Interspeech",
year="2019",
month="Sept.",
pages="196-200"
}

@ARTICLE{qpnet_2021,
author="Y.-C. Wu and T. Hayashi and P. L. Tobing and K. Kobayashi and T. Toda",
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
title={Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network}, 
year={2021},
volume={29},
pages={1134-1148},
doi={10.1109/TASLP.2021.3061245}}

Authors

Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: [email protected]

Advisor:
Tomoki Toda @ Nagoya University
E-mail: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
corpus/VCC2018		corpus/VCC2018
qpnet_models		qpnet_models
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quasi-Periodic WaveNet (QPNet)

Introduction

Requirements

Setup

Folder architecture

Example

Corpus download:

Path setup:

Feature extraction:

QPNet vocoder:

Hints

Models and results

References

Citation

Authors

About

Releases

Packages

Languages

License

bigpon/QPNet

Folders and files

Latest commit

History

Repository files navigation

Quasi-Periodic WaveNet (QPNet)

Introduction

Requirements

Setup

Folder architecture

Example

Corpus download:

Path setup:

Feature extraction:

QPNet vocoder:

Hints

Models and results

References

Citation

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages