Waveglow_Inference_in_CUDA

C++ Code to run optimized inference in CUDA of Waveglow, this implementation gives 25% speedup over Nvidia's Pytorch implementation in full precision and 2.5-3x speedup when using TensorCore

By default, this code will use GPU's TensorCore when running on NVIDIA's Volta GPU

Waveglow

Cuda C++ implementation of NVIDIA's Waveglow.

The model architecture based on flows is described in this paper. WaveGlow: a Flow-based Generative Network for Speech Synthesis.

Waveglow, a flow-based network is capable of generating high quality speech from mel-spectograms. It combines insights from Glow and Wavenet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression.

WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable.

Paper claims that in full-precision (32 bit float) waveglow produces speech at the 500kHz on V100 but typically it is about 300-325kHz with pytorch's implementation and 400-420kHz using our implementation in full precision and around 1000kHz using TensorCore in full precision.

Repository Structure

cpp
├── common			(All common files; logger, utils, numpy reader)
│   └── header
│   ├── src
│        
├── sys		        (ML units i.e conv, dense, activation)
│   └── header
│   ├── src      	
│   
├── Waveglow		(WN, upsample, main)
│   └── header
│   ├── src  
├── tools
	└── get_waveglow_weights.py
	└── npy_2_aud.py

Getting Started

Git clone the repository
Download waveglow_weights
Download mel_spectrograms
Update waveglow_weights path in waveglow/header/hparams.hpp file
Run this

    make
    ls -d path_2_mel_folder  >  filename.txt
    ./waveglow_tts filename.txt OutputDir
    python tools/npy_2_aud.py OutputDir

Audio will be stored in OutputDir in .wav format

Training

You can also train your model using this and then use copy tools/get_waveglow_weights.py file in waveglow folder and run

 python get_waveglow_weights.py <checkpoint path>

Inference and Results

Currently the code takes around 250ms to generate 10secs of speech

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
common		common
sys		sys
tools		tools
waveglow		waveglow
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Waveglow_Inference_in_CUDA

Waveglow

Repository Structure

Getting Started

Training

Inference and Results

Resources and references

About

Releases

Packages

Contributors 2

Languages

License

Saurabh-29/Waveglow_Inference_in_CUDA

Folders and files

Latest commit

History

Repository files navigation

Waveglow_Inference_in_CUDA

Waveglow

Repository Structure

Getting Started

Training

Inference and Results

Resources and references

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages