Add a "live" mode that can be used to translate voice communications hands-free continuously #159

unfa · 2024-09-06T12:08:29Z

Hi! Thank you for DSnote, it's incredible software and I am very grateful for it's existance and continued development!

TL;DR: I am looking for wast to help me understand Russian, Ukrainian and other languages spoken over voice chat while I am playing a multiplayer video game.

I was able to get up DSnote to listen to the game's audio output, translate the voice to English and even provide a spoken translation, albeit with substantial delay (I could live with that).

The problem is that the translation in continuous dictation mode is performed after the user manually stops the capture.
In a single-sentence mode the application stops completely after one sentence.

What I'd like is something in-between, or maybe a one-sentence loop mode.

Listen to incoming audio and slice it into short sentences.
Process tanscription and translation
Output translated text via TTS
Back to 1.

This would be already a lot. It would be even better if recording could carry on while transcription/translation is being done so that parts of the voice communication is not lost.
I am not using GPU acceleration (I have an RX6800 XT GPU and Ryzen 9 3900X), but I have plenty CPU cores so I think my machine could handle "dovetailing" transcription/translation/TTS processing depending on models used.

(BTW: I tried using AMD GPU acceleration but when I install it , DSnote freezes my entire system at startup, so I went back to CPU- that's another topic)

mkiol · 2024-09-14T16:57:11Z

Hi. Sorry for very late reply. I was vacationing ⛱️.

translation in continuous dictation mode is performed after the user manually stops the capture

I assume you mean the "Translate to English" feature in Whisper models. If so, the translation is done when silence is detected in the audio stream. If in-game speech mixes with other sounds and there are no strict periods of silence, this may not work well.

That's what I'm thinking, in the case you describe, it would be best to use the Vosk engine, because it supports live decoding, and it's also pretty decent in Russian. The only missing part is the translation from Russian to English. Speech Note already has a full translator implemented, but it is not bundled with STT. It is actually on my "TO-DO" list to extend "Translate to English" to translate to any language and for all engines (not only Whisper, but also Vosk and others).

What I'd like is something in-between, or maybe a one-sentence loop mode.
Listen to incoming audio and slice it into short sentences.
Process tanscription and translation
Output translated text via TTS
Back to 1.

So, you would like to add also TTS... Similar thing has been already requested in #119. Like the idea.

Adding to the backlog.

mkiol added the enhancement New feature or request label Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a "live" mode that can be used to translate voice communications hands-free continuously #159

Add a "live" mode that can be used to translate voice communications hands-free continuously #159

unfa commented Sep 6, 2024 •

edited

Loading

mkiol commented Sep 14, 2024

Add a "live" mode that can be used to translate voice communications hands-free continuously #159

Add a "live" mode that can be used to translate voice communications hands-free continuously #159

Comments

unfa commented Sep 6, 2024 • edited Loading

mkiol commented Sep 14, 2024

unfa commented Sep 6, 2024 •

edited

Loading