Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for local TTS #98

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

FoxCunning
Copy link

This change implements support for local/system text-to-speech using the Web Speech API.

It will read the generated text up to 500 characters. If the prompt has been modified, it will also read the portion that was changed.

image

@FoxCunning FoxCunning marked this pull request as ready for review November 9, 2024 15:03
@lmg-anon
Copy link
Owner

This is interesting, but I don't think it should be a new collapsible group in the sidebar. Maybe it could open a new modal when clicking on a button here instead:
image

Or, even better, the configurations could be added to the Editor Preferences modal.

@FoxCunning
Copy link
Author

The Editor Preferences modal sounds good and I'll see if I can move all the settings there.
There wouldn't be a button to stop the active playback on the main UI, though.

@FoxCunning FoxCunning marked this pull request as draft December 14, 2024 14:41
mikupad.html Outdated Show resolved Hide resolved
mikupad.html Outdated Show resolved Hide resolved
mikupad.html Outdated Show resolved Hide resolved
mikupad.html Outdated Show resolved Hide resolved
@FoxCunning
Copy link
Author

FoxCunning commented Jan 1, 2025

Happy new year! 🎆🎉

I've now fully re-written the TTS code.

  • The main change is in the App.predict method. Using "useEffect" hooks proved to be unreliable. On the bright side, I only had to add a few lines in that method.
  • Almost all of the TTS functionality is in separate methods that I kept together starting at line 7350.
  • The settings UI is now in the Editor Preferences modal. Disabling TTS via checkbox makes all the other TTS-related elements disappear.
  • I've also added an option to not narrate user inputs.
  • window.TTS has been removed. Now it only uses React variables stored with useRef / usePersistentState inside the App class. TTS settings should be saved in local storage.
  • The SVG I added is a simple "stop" button used to, well, stop the TTS while it's speaking. Since it's in the editor preferences, I also added a "hotkey" (CTRL+E) so it's possible to stop the narration without having to open the menu.

To be noted: the reason why I chose to process text chunks as soon as they come is that this way the speech synthesis can start narrating as soon as a sentence is complete (e.g. the AI generates a newline or other "stopping" token). This way, if a lot of text is being generated, and especially if the AI is slow, the user does not have to wait until it's all finished before the narration starts.
Unterminated user inputs will be narrated (if the option is selected) together with the next generation, to form a complete sentence.

I've tested it quite a bit with llama.cpp and koboldcpp.
If there's anything else you think should be changed, let me know.

@FoxCunning FoxCunning reopened this Jan 1, 2025
@FoxCunning FoxCunning marked this pull request as ready for review January 1, 2025 19:32
@FoxCunning FoxCunning requested a review from lmg-anon January 1, 2025 19:32
Copy link
Author

@FoxCunning FoxCunning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed - See previous comment in pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants