XTTS: Hallucinations #317

UtkuBulkan · 2025-02-22T08:37:31Z

UtkuBulkan
Feb 22, 2025

Describe the bug

I am using the code below to genate voice cloned TTS which works fine, but only for long text. If the text consists of 3-4 words or less, there is a high chance that it will get hallucination and provide wrong/unexpected output.

Is there any way to overcome this somehow ?

To Reproduce

import torch
from TTS.tts.models.xtts import Xtts
model_directory = "/root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/"
config = XttsConfig()
config.load_json(f"{model_directory}config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=f"{model_directory}", use_deepspeed=True)

gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=[local_audio_vocals_path])
save_path = f"{DIRECTORY}/generatedoutput_{i}.wav"
out = model.inference(
      repeated_prompt,
      TARGET_LANG_INITIALS,
      gpt_cond_latent,
      speaker_embedding,
      speed=1.0,
      enable_text_splitting=True,
)
torchaudio.save(save_path, torch.tensor(out["wav"]).unsqueeze(0), 22050)

Expected behavior

Short text should also be generated succesfully.

Logs

Environment

- TTS Version : v0.24.3 or v0.25.3
- PyTorch Version 2.5.1
- Python Version : 3.9.20
- OS AWS Linux x86_64 x86_64 x86_64 GNU/Linux
- CUDA/cuDNN Version : 11.8.0
- GPU Model : T4
- How you installed PyTorch : pip

Additional context

No response

Answered by eginhard

Feb 25, 2025

Yes, hallucinations are common in the XTTS model, especially with short text. You could try fine-tuning it with more short utterances.

View full answer

eginhard · 2025-02-25T14:17:11Z

eginhard
Feb 25, 2025
Maintainer

Yes, hallucinations are common in the XTTS model, especially with short text. You could try fine-tuning it with more short utterances.

0 replies

ROBERT-MCDOWELL · 2025-02-25T14:30:57Z

ROBERT-MCDOWELL
Feb 25, 2025

use a " ." (space + dot) at the end at least. results will be much better. punctuation (always with a space between the word and the punctuation) can help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XTTS: Hallucinations #317

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

XTTS: Hallucinations #317

UtkuBulkan Feb 22, 2025

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Replies: 2 comments

eginhard Feb 25, 2025 Maintainer

ROBERT-MCDOWELL Feb 25, 2025

UtkuBulkan
Feb 22, 2025

eginhard
Feb 25, 2025
Maintainer

ROBERT-MCDOWELL
Feb 25, 2025