Replies: 1 comment
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
>>> geneing
[May 21, 2019, 4:49pm]
I need help understanding something.
I removed the linear spectrogram part of the loss function, along with
postnet that generates it. I didn't need the linear spectrogram for the
vocoder and removing the linear spectrogram part save a LOT of GPU
memory during training. However, the reduced model doesn't produce
reasonable attention even after 50K steps. For the full model, attention
was reasonable after only a few thousand steps.
Why isn't mel spectrogram part of the loss not enough to train the
attention?
[This is an archived TTS discussion thread from discourse.mozilla.org/t/no-alignment-without-linear-spectrograms]
Beta Was this translation helpful? Give feedback.
All reactions