TTS audio caches too broadly. #4084

jcc10 · 2024-11-06T05:55:01Z

LocalAI version:
localai/localai:v2.22.1-cublas-cuda12-ffmpeg@sha256:4ac028a056c946e047548e437952c80ba79b66af9428cbec628ab4ffedc47120
LocalAI Version v2.22.1 (015835dba2854572d50e167b7cade05af41ed214)

Environment, CPU architecture, OS, and Version:
While I am in docker, here is the bare meatal specs:
Linux pop-os 6.9.3-76060903-generic #202405300957~1726766035~22.04~4092a0e SMP PREEMPT_DYNAMIC Thu S x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
Backend of TTS is not selecting properly, is caching too long, or is otherwise not re-generating when it should.

To Reproduce
Run the following commands:

curl --request POST \
  --url LocalAI_Instance/tts \
  --header 'content-type: application/json' \
  --data '{
  "backend": "coqui",
  "model": "tts_models/en/ljspeech/glow-tts",
  "input":"Welcome back my friends to the show that never ends!"
  }'|sha1sum

and

curl --request POST \
  --url  LocalAI_Instance/tts \
  --header 'content-type: application/json' \
  --data '{
   "backend": "vall-e-x",
   "input":"Welcome back my friends to the show that never ends!"
 }'|sha1sum

the two sums will be the same. Since they came from two different backends they shouldn't be.

Expected behavior
I would expect that two different backends should not produce identical results when given multiple arbitrary inputs.

Logs

5:51AM INF Success ip=127.0.0.1 latency="39.134µs" method=GET status=200 url=/readyz

5:51AM DBG guessDefaultsFromFile: modelPath is empty

5:51AM DBG Request for model: tts_models/en/ljspeech/glow-tts

5:51AM INF Loading model '' with backend coqui

5:51AM DBG Model already loaded in memory: 

5:51AM DBG Checking model availability ()

5:51AM INF Success ip=172.29.0.1 latency=169.709648ms method=POST status=200 url=/tts

5:52AM DBG guessDefaultsFromFile: modelPath is empty

5:52AM DBG Request for model: 

5:52AM INF Loading model '' with backend vall-e-x

5:52AM DBG Model already loaded in memory: 

5:52AM DBG Checking model availability ()

5:52AM INF Success ip=172.29.0.1 latency=70.942259ms method=POST status=200 url=/tts

Additional context
I intended to use the voice cloning, and the models trigger the correct backends to be prompted, however those files are also identical to the others.

The text was updated successfully, but these errors were encountered:

jcc10 added bug Something isn't working unconfirmed labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS audio caches too broadly. #4084

TTS audio caches too broadly. #4084

jcc10 commented Nov 6, 2024

TTS audio caches too broadly. #4084

TTS audio caches too broadly. #4084

Comments

jcc10 commented Nov 6, 2024