You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, readalongs align fails when a word is converted to an empty string by the g2p module.
Error message:
ERROR - Alignment produced a different number of segments and tokens, please examine dictionary and input audio and text.
To reproduce this error, checkout 76faf18 in g2p or any commit before the problem with "s" disappearing is fixed in French g2p, go to OpenSamples, and run:
readalongs align -i -s -f -l fra UDHR-Librivox/human_rights_un_frn-preamble.txt UDHR-Librivox/human_rights_un_frn_ezwa_64kb-preamble.mp3 output/UDHR-fra-preamble
The error in this specific example is due to word <w>s</w> (the 330th token in UDHR-fra-preamble.tokenized.xml, on line 37) turning into an empty string because of my g2p rule erasing word-final "s" including in this case where the whole word is "s". As a consequence, file UDHR-fra-preamble.dict skips from token t0b0d0p10s0w42 to t0b0d0p10s0w44, bypassing empty token t0b0d0p10s0w43, causing a mismatch between the number of tokens and dictionary entries.
Eventually, I'll fix the French g2p to not swallow "s", but Studio needs to handle this case gracefully. Options:
Consider it an error and output a meaningful message telling the user to edit the g2p. This is not a great option for general users who might not know how to edit the g2p, though.
Fix the aligner code to align the whole text anyway, cleanly skipping over (or otherwise handling) the word with an empty phonetic representation.
The text was updated successfully, but these errors were encountered:
joanise
changed the title
Studio fails to align when a word is "eaten" up by g2p.
Studio fails to align when a word is "eaten up" by g2p.
Mar 31, 2020
Currently,
readalongs align
fails when a word is converted to an empty string by the g2p module.Error message:
To reproduce this error, checkout
76faf18
in g2p or any commit before the problem with "s" disappearing is fixed in French g2p, go toOpenSamples
, and run:The error in this specific example is due to word
<w>s</w>
(the 330th token inUDHR-fra-preamble.tokenized.xml
, on line 37) turning into an empty string because of my g2p rule erasing word-final "s" including in this case where the whole word is "s". As a consequence, fileUDHR-fra-preamble.dict
skips from tokent0b0d0p10s0w42
tot0b0d0p10s0w44
, bypassing empty tokent0b0d0p10s0w43
, causing a mismatch between the number of tokens and dictionary entries.Eventually, I'll fix the French g2p to not swallow "s", but Studio needs to handle this case gracefully. Options:
The text was updated successfully, but these errors were encountered: