Releases: KoljaB/RealtimeTTS
Releases · KoljaB/RealtimeTTS
v0.4.40
RealtimeTTS v0.4.4 Release Notes
Configurable Playback Parameters
New Parameters: frames_per_buffer
and playout_chunk_size
-
Purpose:
- These new parameters provide finer control over audio playback buffering, which is especially useful for mitigating stuttering issues on Unix-based systems.
-
Details:
-
frames_per_buffer
:- Controls the number of audio frames processed per buffer by PyAudio.
- Lower values reduce latency but increase CPU usage, while higher values reduce CPU load but increase latency.
- Recommended Settings for Stuttering:
- Start by setting
frames_per_buffer
to256
. - If issues persist, reduce it further to
128
.
- Start by setting
Example:
stream = TextToAudioStream(engine, frames_per_buffer=256)
-
playout_chunk_size
:- Specifies the size (in bytes) of audio chunks played out to the stream.
- Works in conjunction with
frames_per_buffer
to optimize audio smoothness. - Defaults to dynamic calculation, but can be explicitly set for precise control.
Example:
stream = TextToAudioStream(engine, playout_chunk_size=1024)
-
How These Parameters Address Stuttering:
- On Unix systems, default buffer sizes may cause sporadic stuttering during audio playback due to timing mismatches between the audio stream and system audio drivers.
- By reducing
frames_per_buffer
to256
or128
, the playback becomes more responsive and better aligned with system timing. - Adjusting
playout_chunk_size
further enhances playback smoothness by ensuring optimal chunk delivery to the audio stream.
Usage Examples
Basic Configuration:
from RealtimeTTS import TextToAudioStream, PiperEngine
engine = PiperEngine(piper_path="path/to/piper.exe", voice=my_voice)
stream = TextToAudioStream(
engine=engine,
frames_per_buffer=256, # Start with 256 to reduce stuttering
playout_chunk_size=1024 # Optional for further customization
)
stream.play()
Fine-Tuning for Stuttering:
- If playback issues occur:
- Set
frames_per_buffer
to256
(recommended starting point). - Reduce to
128
if stuttering persists. - Optionally adjust
playout_chunk_size
to a fixed value like1024
or512
.
- Set
- Backward Compatibility:
- Defaults for
frames_per_buffer
andplayout_chunk_size
maintain compatibility with previous versions, requiring no changes for existing setups unless adjustments are needed.
- Defaults for
v0.4.3
RealtimeTTS v0.4.3 Release Notes
New Feature: PiperEngine
-
Introduction
- Introducing the PiperEngine to support the Piper text-to-speech model.
-
Installation
-
Separate Installation Required: Piper must be installed separately from RealtimeTTS. Follow the Piper installation tutorial for Windows to set up Piper on your system.
-
Install RealtimeTTS:
pip install RealtimeTTS
Note: Unlike other engines, there is no need to install Piper support with
pip install RealtimeTTS[piper]
. The[piper]
option is not supported.
-
-
Usage
-
Configure PiperEngine:
- Specify the path to the Piper executable and the desired voice model using the
PiperVoice
andPiperEngine
classes. - Refer to the Piper test file for an example of how to set up and use PiperEngine in your projects.
- Specify the path to the Piper executable and the desired voice model using the
-
Example:
from RealtimeTTS import TextToAudioStream, PiperEngine, PiperVoice def dummy_generator(): yield "This is piper tts speaking." voice = PiperVoice( model_file="D:/Downloads/piper_windows_amd64/piper/en_US-kathleen-low.onnx", config_file="D:/Downloads/piper_windows_amd64/piper/en_US-kathleen-low.onnx.json", ) engine = PiperEngine( piper_path="D:/Downloads/piper_windows_amd64/piper/piper.exe", voice=voice, ) stream = TextToAudioStream(engine) stream.feed(dummy_generator()) stream.play()
-
Additional Information
-
Piper Resources:
- Installation Tutorial: Watch on YouTube
- Test File Example: piper_test.py
-
Support:
- If you have any issues or have questions about the new PiperEngine, please open an issue.
v0.4.21
RealtimeTTS v0.4.21 Release Notes
🚀 New Features
- update to latest versions of dependencies (stream2sentence, coqui-tts, elevenlabs, openai, edge-tts)
StyleTTS Engine
- Added seed. Added fix for a styletts2 problem causing noise to be generated with very short texts, especially when using embedding_scale values > 1
🛠 Bug Fixes
- Fixed a problem in stream2sentence causing minimum_sentence_length to not be respected
v0.4.20 🌿
RealtimeTTS v0.4.20 Release Notes
🚀 New Features
Azure Engine
- Added support for 48 kHz audio output in the Azure TTS engine for improved audio quality (and providing more flexibility in audio formats).
StyleTTS Engine
- introduced StyleTTSVoice for dynamic voice switching to allow transitions between multiple voice models
🛠 Bug Fixes
- Fixed incorrect voice initialization when switching between models in the StyleTTS engine.
- Fixed model configuration path issues during runtime when updating voice parameters.
v0.4.19
v0.4.17
v0.4.14
fixes #223
Enhancements to Sentence Processing
- Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
- Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering
force_first_fragment_after_words
, causing processing errors. - Increased the default
force_first_fragment_after_words
threshold from 15 to 30 for better fragment control.
v0.4.13
RealtimeTTS v0.4.13 Release Notes
🚀 New Features
EdgeEngine
- Introducing EdgeEngine, a free, extremely lightweight, and beginner-friendly engine.
- Designed for simplicity with no complex dependencies, making it ideal for lightweight projects or newcomers to TTS.
🛠 Bug Fixes
v0.4.11
v0.4.10
- new stream2sentence version 0.2.7
- bugfix for #5 (causing a whitespace between words to get lost sometimes)
- upgrade to latest NLTK and Stanza versions including new "punkt-tab" model
- allow offline environment for stanza
- adds support for async streams (preparations for async in RealtimeTTS)
- dependency upgrades to latest version (coqui tts 0.24.2 ➡️ 0.24.3, elevenlabs 1.11.0 ➡️ 1.12.1, openai 1.52.2 ➡️ 1.54.3)
- added load_balancing parameter to coqui engine
- if you have a fast machine with a realtime factor way lower than 1, we infer way faster then we need to
- this parameter allows you to infer with a rt factor closer to 1, so you will still have streaming voice inference BUT your GPU load goes down to the minimum that is needed to produce chunks in realtime
- if you do LLM inference in parallel this will be faster now because TTS takes less load