Inconsistent mask shape in forward_tts _forward_aligner and AlignmentNetwork docstrings #312
-
In the docstrings of forward_tts _forward_aligner, it is said the shape of parameter x_mask should be as - x_mask: :math:[B, 1, T_en] and the same shaped x_mask is passed to aligner without squeezing. coqui-ai-TTS/TTS/tts/models/forward_tts.py Line 507 in b20533e But forward of AlignmentNetwork is expecting x_mask as - mask: :math:[B, T_de]. So, what is the actual size expected at AlignmentNetwork forward?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I wouldn't rely on docstrings in Coqui to be consistent, but PRs with fixes are always welcome! Inserting coqui-ai-TTS/TTS/tts/models/forward_tts.py Line 647 in b20533e x_mask torch.Size([1, 1, 10]) when running tts --model_name "tts_models/en/ljspeech/fast_pitch" --text "hello world" (that's the inference function but it should be the same in forward ).
Also, in the future please share code as text or link directly to the correct location (I've edited your question accordingly). Images make it difficult to locate the code and aren't accessible for people with visual impairments. Old repo crosslink: coqui-ai#4158 |
Beta Was this translation helpful? Give feedback.
I wouldn't rely on docstrings in Coqui to be consistent, but PRs with fixes are always welcome! Inserting
print("x_mask", x_mask.shape)
aftercoqui-ai-TTS/TTS/tts/models/forward_tts.py
Line 647 in b20533e
x_mask torch.Size([1, 1, 10])
when runningtts --model_name "tts_models/en/ljspeech/fast_pitch" --text "hello world"
(that's theinference
function but it should be the same inforward
).Also, in the future please share code as text or link directly to the correct location (I've edited your question accordingly). Images make it difficult to locate the code and aren't accessible for people with visual impairm…