Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "live" mode that can be used to translate voice communications hands-free continuously #159

Open
unfa opened this issue Sep 6, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@unfa
Copy link

unfa commented Sep 6, 2024

Hi! Thank you for DSnote, it's incredible software and I am very grateful for it's existance and continued development!

TL;DR: I am looking for wast to help me understand Russian, Ukrainian and other languages spoken over voice chat while I am playing a multiplayer video game.

I was able to get up DSnote to listen to the game's audio output, translate the voice to English and even provide a spoken translation, albeit with substantial delay (I could live with that).

The problem is that the translation in continuous dictation mode is performed after the user manually stops the capture.
In a single-sentence mode the application stops completely after one sentence.

What I'd like is something in-between, or maybe a one-sentence loop mode.

  1. Listen to incoming audio and slice it into short sentences.
  2. Process tanscription and translation
  3. Output translated text via TTS
  4. Back to 1.

This would be already a lot. It would be even better if recording could carry on while transcription/translation is being done so that parts of the voice communication is not lost.
I am not using GPU acceleration (I have an RX6800 XT GPU and Ryzen 9 3900X), but I have plenty CPU cores so I think my machine could handle "dovetailing" transcription/translation/TTS processing depending on models used.

(BTW: I tried using AMD GPU acceleration but when I install it , DSnote freezes my entire system at startup, so I went back to CPU- that's another topic)

@mkiol mkiol added the enhancement New feature or request label Sep 14, 2024
@mkiol
Copy link
Owner

mkiol commented Sep 14, 2024

Hi. Sorry for very late reply. I was vacationing ⛱️.

translation in continuous dictation mode is performed after the user manually stops the capture

I assume you mean the "Translate to English" feature in Whisper models. If so, the translation is done when silence is detected in the audio stream. If in-game speech mixes with other sounds and there are no strict periods of silence, this may not work well.

That's what I'm thinking, in the case you describe, it would be best to use the Vosk engine, because it supports live decoding, and it's also pretty decent in Russian. The only missing part is the translation from Russian to English. Speech Note already has a full translator implemented, but it is not bundled with STT. It is actually on my "TO-DO" list to extend "Translate to English" to translate to any language and for all engines (not only Whisper, but also Vosk and others).

image

What I'd like is something in-between, or maybe a one-sentence loop mode.
Listen to incoming audio and slice it into short sentences.
Process tanscription and translation
Output translated text via TTS
Back to 1.

So, you would like to add also TTS... Similar thing has been already requested in #119. Like the idea.

Adding to the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants