Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS audio caches too broadly. #4084

Open
jcc10 opened this issue Nov 6, 2024 · 0 comments
Open

TTS audio caches too broadly. #4084

jcc10 opened this issue Nov 6, 2024 · 0 comments
Labels
bug Something isn't working unconfirmed

Comments

@jcc10
Copy link

jcc10 commented Nov 6, 2024

LocalAI version:
localai/localai:v2.22.1-cublas-cuda12-ffmpeg@sha256:4ac028a056c946e047548e437952c80ba79b66af9428cbec628ab4ffedc47120
LocalAI Version v2.22.1 (015835dba2854572d50e167b7cade05af41ed214)

Environment, CPU architecture, OS, and Version:
While I am in docker, here is the bare meatal specs:
Linux pop-os 6.9.3-76060903-generic #202405300957~1726766035~22.04~4092a0e SMP PREEMPT_DYNAMIC Thu S x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
Backend of TTS is not selecting properly, is caching too long, or is otherwise not re-generating when it should.

To Reproduce
Run the following commands:

curl --request POST \
  --url LocalAI_Instance/tts \
  --header 'content-type: application/json' \
  --data '{
  "backend": "coqui",
  "model": "tts_models/en/ljspeech/glow-tts",
  "input":"Welcome back my friends to the show that never ends!"
  }'|sha1sum

and

curl --request POST \
  --url  LocalAI_Instance/tts \
  --header 'content-type: application/json' \
  --data '{
   "backend": "vall-e-x",
   "input":"Welcome back my friends to the show that never ends!"
 }'|sha1sum

the two sums will be the same. Since they came from two different backends they shouldn't be.

Expected behavior
I would expect that two different backends should not produce identical results when given multiple arbitrary inputs.

Logs

5:51AM INF Success ip=127.0.0.1 latency="39.134µs" method=GET status=200 url=/readyz

5:51AM DBG guessDefaultsFromFile: modelPath is empty

5:51AM DBG Request for model: tts_models/en/ljspeech/glow-tts

5:51AM INF Loading model '' with backend coqui

5:51AM DBG Model already loaded in memory: 

5:51AM DBG Checking model availability ()

5:51AM INF Success ip=172.29.0.1 latency=169.709648ms method=POST status=200 url=/tts

5:52AM DBG guessDefaultsFromFile: modelPath is empty

5:52AM DBG Request for model: 

5:52AM INF Loading model '' with backend vall-e-x

5:52AM DBG Model already loaded in memory: 

5:52AM DBG Checking model availability ()

5:52AM INF Success ip=172.29.0.1 latency=70.942259ms method=POST status=200 url=/tts

Additional context
I intended to use the voice cloning, and the models trigger the correct backends to be prompted, however those files are also identical to the others.

@jcc10 jcc10 added bug Something isn't working unconfirmed labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

1 participant