XTTS: Hallucinations #317
-
Describe the bugI am using the code below to genate voice cloned TTS which works fine, but only for long text. If the text consists of 3-4 words or less, there is a high chance that it will get hallucination and provide wrong/unexpected output. Is there any way to overcome this somehow ? To Reproduce
Expected behaviorShort text should also be generated succesfully. LogsEnvironment- TTS Version : v0.24.3 or v0.25.3
- PyTorch Version 2.5.1
- Python Version : 3.9.20
- OS AWS Linux x86_64 x86_64 x86_64 GNU/Linux
- CUDA/cuDNN Version : 11.8.0
- GPU Model : T4
- How you installed PyTorch : pip Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Yes, hallucinations are common in the XTTS model, especially with short text. You could try fine-tuning it with more short utterances. |
Beta Was this translation helpful? Give feedback.
-
use a " ." (space + dot) at the end at least. results will be much better. punctuation (always with a space between the word and the punctuation) can help. |
Beta Was this translation helpful? Give feedback.
Yes, hallucinations are common in the XTTS model, especially with short text. You could try fine-tuning it with more short utterances.