Inconsistent mask shape in forward_tts _forward_aligner and AlignmentNetwork docstrings #312

Prajwalbhandari869 · 2025-02-22T15:56:34Z

Prajwalbhandari869
Feb 22, 2025

In the docstrings of forward_tts _forward_aligner, it is said the shape of parameter x_mask should be as - x_mask: :math:[B, 1, T_en] and the same shaped x_mask is passed to aligner without squeezing.

coqui-ai-TTS/TTS/tts/models/forward_tts.py

Line 507 in b20533e

- x_mask: :math:`[B, 1, T_en]`

But forward of AlignmentNetwork is expecting x_mask as - mask: :math:[B, T_de].

coqui-ai-TTS/TTS/tts/layers/generic/aligner.py

Line 74 in b20533e

- mask: :math:`[B, T_de]`

So, what is the actual size expected at AlignmentNetwork forward?

Should it be actual passed - x_mask: :math:[B, 1, T_en]?
Or Should it be expecting - mask: :math:[B, T_de]?
Or is the docstring inconsistent?

Answered by eginhard

Feb 26, 2025

I wouldn't rely on docstrings in Coqui to be consistent, but PRs with fixes are always welcome! Inserting print("x_mask", x_mask.shape) after

coqui-ai-TTS/TTS/tts/models/forward_tts.py

Line 647 in b20533e

o_en, x_mask, g, _ = self._forward_encoder(x, x_mask, g)

prints x_mask torch.Size([1, 1, 10]) when running tts --model_name "tts_models/en/ljspeech/fast_pitch" --text "hello world" (that's the inference function but it should be the same in forward).

Also, in the future please share code as text or link directly to the correct location (I've edited your question accordingly). Images make it difficult to locate the code and aren't accessible for people with visual impairm…

View full answer

eginhard · 2025-02-26T08:52:47Z

eginhard
Feb 26, 2025
Maintainer

I wouldn't rely on docstrings in Coqui to be consistent, but PRs with fixes are always welcome! Inserting print("x_mask", x_mask.shape) after

coqui-ai-TTS/TTS/tts/models/forward_tts.py

Line 647 in b20533e

o_en, x_mask, g, _ = self._forward_encoder(x, x_mask, g)

prints x_mask torch.Size([1, 1, 10]) when running tts --model_name "tts_models/en/ljspeech/fast_pitch" --text "hello world" (that's the inference function but it should be the same in forward).

Also, in the future please share code as text or link directly to the correct location (I've edited your question accordingly). Images make it difficult to locate the code and aren't accessible for people with visual impairments.

Old repo crosslink: coqui-ai#4158

1 reply

Prajwalbhandari869 Feb 26, 2025
Author

Later, I checked the code with a tensor, and it was okay. Thank you for your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent mask shape in forward_tts _forward_aligner and AlignmentNetwork docstrings #312

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Inconsistent mask shape in forward_tts _forward_aligner and AlignmentNetwork docstrings #312

Prajwalbhandari869 Feb 22, 2025

Replies: 1 comment · 1 reply

eginhard Feb 26, 2025 Maintainer

Prajwalbhandari869 Feb 26, 2025 Author

Prajwalbhandari869
Feb 22, 2025

Replies: 1 comment 1 reply

eginhard
Feb 26, 2025
Maintainer

Prajwalbhandari869 Feb 26, 2025
Author