-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize audio generation latency for real-time WebRTC applications #858
Comments
Hello biraj! Thank you for your feedback, which is very important. The audio generation's latency has two concept:
So, do you mean that every audio output chunk (533.33ms audio) needs 0.6-1.0s to generate? If you are using 4090 or A100, it should only cost 0.3s to generate 0.5s output audio. I would like to know the device you are using, to investigate what is happening, thank you! |
Can I ask about your usage? |
The latency you asked about can be reduced through acceleration, but our current latency should be 2.5-3s, which should be similar to other models. |
@tc-mb GPT-4o voice has a TTFB of 300ms and Moshi has a TTFB of 600ms. 2-3s for TTFB is too high for a natural conversation. |
We have actually compared similar products, and they are basically 2-3 seconds. In terms of data, is the time you are talking about the test time of such products? |
@janak2 Yes, here are some suggestions:
|
i went through this issue but couldn't get much answers.
are there some ways to bring down the latency when
generate_audio=True
? i'm building a real-time speech-to-speech app with webrtc and the 0.6-1.0 sec latency withgenerate_audio=true
is too slow for my needs, especially because every response contains the audio with roughly 12800 samples, which is (533.33 ms at 24khz), and if generation latency is more than this, it causes jitters.any tips to make it faster? maybe a different tts model or some parameter tweaks? or are there bottlenecks in the implementation i should know about?
really need to get this working with lower latency for my use case.
The text was updated successfully, but these errors were encountered: