Things that are different compared to the article #3

ivankunyankin · 2021-07-05T13:23:33Z

There are a number of differences compared to the source:

Differences:

The default training script uses LibriTTS dataset instead of LibriSpeech.
The model implementation uses time-channel separable 1D convolutional modules without groups and shuffling. For details refer Fig. 2 from the article.
Instead of NovoGrad optimizer I use PyTorch's Adam optimizer with weight decay.
Instead of Cutout for augmentation I use a custom function based on PyTorch's masking functions. You can find more details about augmentation here.
Authors of the article utilise cosine annealing learning rate policy with learning rate warmup. I use PyTorch's implementation of one cycle learning rate policy. This policy is well described in this article.

raotnameh · 2021-07-28T22:50:20Z

What is the WER are you able to achieve? Also, can you share the epoch vs WER and CER and LOSS? @ivankunyankin

ivankunyankin added the documentation Improvements or additions to documentation label Jul 6, 2021

Provide feedback