-
I want to basically input text generator ( llm streaming response) into speech model and get output stream, you seem to supporting input, but how do you do output streaming? Can you do it for coqui TTS? |
Beta Was this translation helpful? Give feedback.
Answered by
KoljaB
Nov 9, 2023
Replies: 1 comment 4 replies
-
Output streaming works like this:
I receive chunks from the llm streaming response until a full sentence (or a sentence fragment ending on a comma) is detected. I then use coqui streaming inference to get the text synthesis for that sentence frag with lowest possible latency. I convert the resulting tensor chunks to wav and stream play them with pyAudio. So RealtimeTTS library does support output streaming, but currently only for the XTTS model (coqui does not support it for all models).
Hope that helps...
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I see what you mean. RealtimeTTS is based on stream2sentence library and I don't think there are enough use cases for this to extend this library beyond what it was meant for.
So my suggestion: Fork this library and exchange the stream2sentence generator used in RealtimeTTS with your own.
Here is how you can do it:
Find this line in text_to_stream.py:
and exchange …