Skip to content

Releases: KoljaB/RealtimeTTS

v0.4.40

06 Jan 12:07
Compare
Choose a tag to compare

RealtimeTTS v0.4.4 Release Notes

Configurable Playback Parameters

New Parameters: frames_per_buffer and playout_chunk_size

  • Purpose:

    • These new parameters provide finer control over audio playback buffering, which is especially useful for mitigating stuttering issues on Unix-based systems.
  • Details:

    1. frames_per_buffer:

      • Controls the number of audio frames processed per buffer by PyAudio.
      • Lower values reduce latency but increase CPU usage, while higher values reduce CPU load but increase latency.
      • Recommended Settings for Stuttering:
        • Start by setting frames_per_buffer to 256.
        • If issues persist, reduce it further to 128.

      Example:

      stream = TextToAudioStream(engine, frames_per_buffer=256)
    2. playout_chunk_size:

      • Specifies the size (in bytes) of audio chunks played out to the stream.
      • Works in conjunction with frames_per_buffer to optimize audio smoothness.
      • Defaults to dynamic calculation, but can be explicitly set for precise control.

      Example:

      stream = TextToAudioStream(engine, playout_chunk_size=1024)

How These Parameters Address Stuttering:

  • On Unix systems, default buffer sizes may cause sporadic stuttering during audio playback due to timing mismatches between the audio stream and system audio drivers.
  • By reducing frames_per_buffer to 256 or 128, the playback becomes more responsive and better aligned with system timing.
  • Adjusting playout_chunk_size further enhances playback smoothness by ensuring optimal chunk delivery to the audio stream.

Usage Examples

Basic Configuration:

from RealtimeTTS import TextToAudioStream, PiperEngine

engine = PiperEngine(piper_path="path/to/piper.exe", voice=my_voice)
stream = TextToAudioStream(
    engine=engine,
    frames_per_buffer=256,  # Start with 256 to reduce stuttering
    playout_chunk_size=1024 # Optional for further customization
)
stream.play()

Fine-Tuning for Stuttering:

  • If playback issues occur:
    1. Set frames_per_buffer to 256 (recommended starting point).
    2. Reduce to 128 if stuttering persists.
    3. Optionally adjust playout_chunk_size to a fixed value like 1024 or 512.

  • Backward Compatibility:
    • Defaults for frames_per_buffer and playout_chunk_size maintain compatibility with previous versions, requiring no changes for existing setups unless adjustments are needed.

v0.4.3

02 Jan 09:14
Compare
Choose a tag to compare

RealtimeTTS v0.4.3 Release Notes

New Feature: PiperEngine

  • Introduction

    • Introducing the PiperEngine to support the Piper text-to-speech model.
  • Installation

    • Separate Installation Required: Piper must be installed separately from RealtimeTTS. Follow the Piper installation tutorial for Windows to set up Piper on your system.

    • Install RealtimeTTS:

      pip install RealtimeTTS

      Note: Unlike other engines, there is no need to install Piper support with pip install RealtimeTTS[piper]. The [piper] option is not supported.

  • Usage

    • Configure PiperEngine:

      • Specify the path to the Piper executable and the desired voice model using the PiperVoice and PiperEngine classes.
      • Refer to the Piper test file for an example of how to set up and use PiperEngine in your projects.
    • Example:

      from RealtimeTTS import TextToAudioStream, PiperEngine, PiperVoice
      
      def dummy_generator():
          yield "This is piper tts speaking."
      
      voice = PiperVoice(
          model_file="D:/Downloads/piper_windows_amd64/piper/en_US-kathleen-low.onnx",
          config_file="D:/Downloads/piper_windows_amd64/piper/en_US-kathleen-low.onnx.json",
      )
      
      engine = PiperEngine(
          piper_path="D:/Downloads/piper_windows_amd64/piper/piper.exe",
          voice=voice,
      )
      
      stream = TextToAudioStream(engine)
      stream.feed(dummy_generator())
      stream.play()

Additional Information

v0.4.21

14 Dec 17:44
Compare
Choose a tag to compare

RealtimeTTS v0.4.21 Release Notes

🚀 New Features

  • update to latest versions of dependencies (stream2sentence, coqui-tts, elevenlabs, openai, edge-tts)

StyleTTS Engine

  • Added seed. Added fix for a styletts2 problem causing noise to be generated with very short texts, especially when using embedding_scale values > 1

🛠 Bug Fixes

  • Fixed a problem in stream2sentence causing minimum_sentence_length to not be respected

v0.4.20 🌿

10 Dec 22:05
Compare
Choose a tag to compare

RealtimeTTS v0.4.20 Release Notes

🚀 New Features

Azure Engine

  • Added support for 48 kHz audio output in the Azure TTS engine for improved audio quality (and providing more flexibility in audio formats).

StyleTTS Engine

  • introduced StyleTTSVoice for dynamic voice switching to allow transitions between multiple voice models

🛠 Bug Fixes

  • Fixed incorrect voice initialization when switching between models in the StyleTTS engine.
  • Fixed model configuration path issues during runtime when updating voice parameters.

v0.4.19

07 Dec 07:47
905f1fb
Compare
Choose a tag to compare
  • Added support for the StyleTTS2 engine.
  • Updated Coqui-TTS to version 0.25.0, which includes a fix for issue #227
  • Upgraded all dependent libraries to their latest versions

v0.4.17

30 Nov 21:42
Compare
Choose a tag to compare
  • performance improvements, bugfixes and better edge_test.py for edge tts
EdgeTTSDemo.mp4

v0.4.14

29 Nov 21:47
Compare
Choose a tag to compare

fixes #223

Enhancements to Sentence Processing

  • Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
  • Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering force_first_fragment_after_words, causing processing errors.
  • Increased the default force_first_fragment_after_words threshold from 15 to 30 for better fragment control.

v0.4.13

28 Nov 14:28
Compare
Choose a tag to compare

RealtimeTTS v0.4.13 Release Notes

🚀 New Features

EdgeEngine

  • Introducing EdgeEngine, a free, extremely lightweight, and beginner-friendly engine.
  • Designed for simplicity with no complex dependencies, making it ideal for lightweight projects or newcomers to TTS.

🛠 Bug Fixes

  • Resolved ValueError: ('Sample format not supported', -9994) (#221).
  • Fixed RecursionError: maximum recursion depth exceeded (#222).
  • Addressed the requirement to manually install resampy after installing RealtimeTTS.

v0.4.11

16 Nov 22:22
Compare
Choose a tag to compare
  • optimizations for linux
    • setting multiprocessing spawn start method fix for linux now
    • if tts engine output sample rate is not supported by the sound card the chunks get resampled now
    • mechanism to prevent potential stream buffer overflows added

v0.4.10

07 Nov 14:17
Compare
Choose a tag to compare
  • new stream2sentence version 0.2.7
    • bugfix for #5 (causing a whitespace between words to get lost sometimes)
    • upgrade to latest NLTK and Stanza versions including new "punkt-tab" model
    • allow offline environment for stanza
    • adds support for async streams (preparations for async in RealtimeTTS)
  • dependency upgrades to latest version (coqui tts 0.24.2 ➡️ 0.24.3, elevenlabs 1.11.0 ➡️ 1.12.1, openai 1.52.2 ➡️ 1.54.3)
  • added load_balancing parameter to coqui engine
    • if you have a fast machine with a realtime factor way lower than 1, we infer way faster then we need to
    • this parameter allows you to infer with a rt factor closer to 1, so you will still have streaming voice inference BUT your GPU load goes down to the minimum that is needed to produce chunks in realtime
    • if you do LLM inference in parallel this will be faster now because TTS takes less load