Welcome to RealtimeTTS Discussions! #1

KoljaB · 2023-09-06T17:25:31Z

KoljaB
Sep 6, 2023
Maintainer

Welcome!

This GitHub Discussion space is designated for users and contributors of the project.

Purpose:

Discuss applications and use-cases.
Collaborate on feature development and improvements.
Offer and seek technical support.
Discuss industry best practices related to low-latency voice technology.

Participation Guidelines:

Ask questions related to the project's application and troubleshooting.
Share ideas for new features or improvements.
Upvote helpful discussions and answers.
Maintain a respectful and open-minded tone.

hafsalm · 2023-12-01T06:02:20Z

hafsalm
Dec 1, 2023

Thank you for the library. Any possibility of supporting the OpenAI TTS engine?

4 replies

KoljaB Dec 1, 2023
Maintainer Author

Should be possible. I'll look into that.

hafsalm Dec 1, 2023

That will be awesome 👍

KoljaB Dec 1, 2023
Maintainer Author

New Version 0.3.3 now supports Openai TTS engine.

hafsalm Dec 2, 2023

Thank you so much 👍

hafsalm · 2023-12-05T14:00:54Z

hafsalm
Dec 5, 2023

What will be the best approach to stream the audio that is being generated through an API endpoint to consume in a web application?

Any example code to share as you did for RealtimeSTT?

2 replies

KoljaB Apr 12, 2024
Maintainer Author

I just added some demo code implementing a fastapi server to stream audio to web applications (browser etc)

hafsalm Apr 12, 2024

Thanks a bunch! 🙏 Super helpful!

KoljaB · 2023-12-05T15:05:48Z

KoljaB
Dec 5, 2023
Maintainer Author

Tried this but it is hard to do. Audio chunks come in different formats. Handling of this in RealtimeTTS is already a pain. For example, OpenAI delivers clean MP3 chunks, which we can play with ffmpeg or convert to wav easy. Elevenlabs yields Mp3 chunks which depend on the last. RealtimeTTS plays them with MPV, but conversion to Wav is really hard for this stream.
So we need to wav convert Elevenlabs some how or deal with Elevenlabs chunks in client some how or offer no Elevenlabs. This is my current dilemma.

3 replies

hafsalm Dec 5, 2023

Thank you for your answer 👍 If we exclude Elevenlabs and opt for OpenAI or similar engines that provide clean MP3, would it be feasible? If so, what would be the best approach to achieve it?"

KoljaB Dec 5, 2023
Maintainer Author

My approach would be:

Use on_audio_chunk callback of play or play_async
Convert chunks to 16000 Hz Mono 16 bit (lowest engine format Azure) so you have a defined single format for clients
Send to client with either websockets, Flask or fastAPI. Here it gets tricky because RealtimeTTS is not asyncio based.
Playing on client is quite easy with raw wave audio data.

hafsalm Dec 29, 2023

Thank you 👍

TouficKashmar · 2024-04-15T05:35:24Z

TouficKashmar
Apr 15, 2024

Hello,
I'm encountering an issue when trying to stream the speech chunks into Nvidia audio2face instead of playing them directly. I attempted to implement an audio chunk callback, but it doesn't seem to be functioning as expected. Could anyone provide guidance on how to resolve this problem?

In my solution I commented out the below lines in the _on_audio_chunk function, because audio2face accepts the audio data directly as a numpy array, and not as bytes. Could you please confirm if this is the right location to do that?

if format == pyaudio.paFloat32:
audio_data = np.frombuffer(chunk, dtype=np.float32)
# audio_data = np.int16(audio_data * 32767)
# chunk = audio_data.tobytes()

3 replies

KoljaB Apr 15, 2024
Maintainer Author

That does not seem correct. This code is part of TextToAudioStreams _on_audio_chunk method, which does internal chunk preprocessing before sending it to the external callback. It ensures that float32 chunks coming from the coqui engine are converted to int16 before sending them to the callback, so every chunk that goes into the callback has the same format.

What you want to do:

use on_audio_chunk callback from play or play_async
convert to numpy within that callback

Something like this:

def on_audio_chunk_callback(chunk):
    numpy_chunk = np.frombuffer(chunk, dtype=np.int16)

stream.play_async(on_audio_chunk=on_audio_chunk_callback)

You now have mono 16-bit chunks as numpy array. You might have to resample the chunks to the expected target sample framerate of Nvidia audio2face before handing them over.

from scipy.signal import resample
num_original_samples = len(numpy_chunk)
num_target_samples = int(num_original_samples * target_sample_rate / original_sample_rate)
resampled_chunk = resample(numpy_chunk, num_target_samples)

To get the sample rate of the chunks of the engine you are using:

 audio_format, channel, sample_rate = engine.get_stream_info()

TouficKashmar Apr 16, 2024

Thank you for the tip, I've used it however I was wondering if you could guide me on how to increase the audio chunk size

def stream_voice(self, audio_chunk):
    audio_data = np.frombuffer(audio_chunk, dtype=np.int16)
    audio_data_float = audio_data.astype(np.float32) / 32767.0
    self.stream_client.stream(audio_data_float)

This is my callback function, the problem I am facing at the moment is that each audio chunk is 0.01 sec and whenever I stream it into the client all I get is noisy data (because it is very short, the sample rate 24000 works well on my client side). I tried play and play async but both are the same in my case, could you please guide me on a possible solution?

KoljaB Apr 16, 2024
Maintainer Author

If your chunks are too short you could create a buffer either on server or on client side. The buffer could accumulate chunks until a certain chunksize is reached and the hand out that bigger chunk (like here.

Or just accumulate more than 1 chunk before returning. A javascript ringbuffer to do this could look like this:

class RingBuffer {
    constructor(size) {
        this.buffer = new Float32Array(size);
        this.readIndex = 0;
        this.writeIndex = 0;
        this.available = 0;
        this.size = size;
    }

    push(data) {
        data.forEach(sample => {
            if (this.available < this.size) { // Prevent overflow
                this.buffer[this.writeIndex] = sample;
                this.writeIndex = (this.writeIndex + 1) % this.size;
                this.available++;
            }
        });
    }

    pull(amount) {
        let output = new Float32Array(amount);
        for (let i = 0; i < amount; i++) {
            if (this.available > 0) { // Ensure data is available
                output[i] = this.buffer[this.readIndex];
                this.readIndex = (this.readIndex + 1) % this.size;
                this.available--;
            } else {
                output[i] = 0; // Output silence if buffer is empty
            }
        }
        return output;
    }
}

Streaming audio chunks to clients is hard, especially when dealing with different client output device setups.

HirparaAmit · 2024-04-16T03:48:53Z

HirparaAmit
Apr 16, 2024

Hello,
I am facing an error while running coqui_test.py. Actually, I am running this file on one of my runpod instances and facing below error. I just want to create a Flask API endpoint to convert given text into audio in real time. Is there anything I am supposed to setup externally ?

7 replies

KoljaB Apr 18, 2024
Maintainer Author

Use the on_audio_chunk callback method of play or play_async that yields raw bytes

jacksecretdesires Apr 18, 2024

massive thank you for your responsiveness in this discussion thread. i'm with amit here and i'm curious, what hardware would you recommend for running this repo in terms of best bang for your buck? we're running 3080 tis right now but i'm wondering if there's a better setup.

KoljaB Apr 18, 2024
Maintainer Author

I run it on Windows 10 with a RTX 2080 Super 8 GB VRAM and have a realtime factor of ~0.6-0.7. As of my knowledge on Linux you can activate deepspeed and with this it should synthesize a good amount faster then on Windows, but I can not tell you how much. I have deepspeed activated too but on Windows it does not do much.
So I think something like a RTX 4060 8 GB on a Linux systen with deepspeed could do the job, but I am not sure on this.

HirparaAmit Apr 19, 2024

If you don't mind would you please guide me here.
Basically I want to perform simple TTS task where I will give text to CoquiEngine and in the response I want simple audio file of that text. I don't want to perform audio streaming and that's because I am running the code on cloud so the code can't find any output device like speaker (you know about this as we have already discussed this thing in this conversation).
As per your last suggestion, I have completely gone through play method. I even tried to use output_wavfile and muted parameters, but I didn't get the results.
So, I am thinking to update play method to remove all the parts of audio streaming and only keep logics for threading, saving audio in file, synthesize_worker and all other things except audio streaming. So like when I run stream.play(output_wavfile=stream.engine.engine_name + "_output.wav") , it should only convert feeded text into audio file and don't throw error like Error opening stream: [Errno -9996] Invalid output device (no default output device) .
So could you please guide me if this is possible.

KoljaB Apr 19, 2024
Maintainer Author

Error opening stream: [Errno -9996] Invalid output device (no default output device) .

This is because RealtimeTTS always tries to open output stream, also in muted=True case.
Probably best to comment that out in stream_player.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to RealtimeTTS Discussions! #1

{{title}}

Replies: 5 comments 19 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Welcome to RealtimeTTS Discussions! #1

KoljaB Sep 6, 2023 Maintainer

Welcome!

Purpose:

Participation Guidelines:

Replies: 5 comments · 19 replies

KoljaB Dec 1, 2023 Maintainer Author

KoljaB Dec 1, 2023 Maintainer Author

KoljaB Apr 12, 2024 Maintainer Author

KoljaB Dec 5, 2023 Maintainer Author

KoljaB Dec 5, 2023 Maintainer Author

KoljaB Apr 15, 2024 Maintainer Author

KoljaB Apr 16, 2024 Maintainer Author

KoljaB Apr 18, 2024 Maintainer Author

KoljaB Apr 18, 2024 Maintainer Author

KoljaB Apr 19, 2024 Maintainer Author

KoljaB
Sep 6, 2023
Maintainer

Replies: 5 comments 19 replies

KoljaB Dec 1, 2023
Maintainer Author

KoljaB Dec 1, 2023
Maintainer Author

KoljaB Apr 12, 2024
Maintainer Author

KoljaB
Dec 5, 2023
Maintainer Author

KoljaB Dec 5, 2023
Maintainer Author

KoljaB Apr 15, 2024
Maintainer Author

KoljaB Apr 16, 2024
Maintainer Author

KoljaB Apr 18, 2024
Maintainer Author

KoljaB Apr 18, 2024
Maintainer Author

KoljaB Apr 19, 2024
Maintainer Author