Skip to content

Commit

Permalink
Merge pull request #15 from KoljaB/stopbug-fix
Browse files Browse the repository at this point in the history
Bugfix for stop() method
  • Loading branch information
KoljaB authored Nov 30, 2023
2 parents c4fc0d5 + 556cff3 commit b185026
Show file tree
Hide file tree
Showing 6 changed files with 161 additions and 108 deletions.
221 changes: 126 additions & 95 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real

https://github.com/KoljaB/RealtimeTTS/assets/7604638/87dcd9a5-3a4e-4f57-be45-837fc63237e7


### Key Features
## Key Features

- **Low Latency**
- almost instantaneous text-to-speech conversion
Expand All @@ -22,16 +21,16 @@ https://github.com/KoljaB/RealtimeTTS/assets/7604638/87dcd9a5-3a4e-4f57-be45-837
- ensures continuous operation with a fallback mechanism
- switches to alternative engines in case of disruptions guaranteeing consistent performance and reliability, which is vital for critical and professional use cases

> **Hint**: *Check out [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT), the input counterpart of this library, for speech-to-text capabilities. Together, they form a powerful realtime audio wrapper around large language models.*
> **Hint**: *check out [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT), the input counterpart of this library, for speech-to-text capabilities. Together, they form a powerful realtime audio wrapper around large language models.*
### Updates
## Updates

Latest Version: v0.3.0
Latest Version: v0.3.1

#### New Features:
1. Expanded language support, including Chinese (details in [tests](https://github.com/KoljaB/RealtimeTTS/blob/master/tests/chinese_test.py) and [speed test](https://github.com/KoljaB/RealtimeTTS/blob/master/tests/pyqt6_speed_test_chinese.py)).
2. Fallback engines in TextToAudioStream, enhancing reliability for real-time scenarios by switching to alternate engines if one fails.
3. Audio file saving feature with `output_wavfile` parameter. This allows for the simultaneous saving of real-time synthesized audio, enabling later playback of the live synthesis.
- expanded language support, including Chinese (details in [tests](https://github.com/KoljaB/RealtimeTTS/blob/master/tests/chinese_test.py) and [speed test](https://github.com/KoljaB/RealtimeTTS/blob/master/tests/pyqt6_speed_test_chinese.py)).
- fallback engines in TextToAudioStream, enhancing reliability for real-time scenarios by switching to alternate engines if one fails.
- audio file saving feature with `output_wavfile` parameter. This allows for the simultaneous saving of real-time synthesized audio, enabling later playback of the live synthesis.

For more details, see the [release history](https://github.com/KoljaB/RealtimeTTS/releases).

Expand All @@ -52,13 +51,25 @@ This library uses:

## Installation

Simple installation:

```bash
pip install RealtimeTTS
```

This will install all the necessary dependencies, including a **CPU support only** version of PyTorch (needed for Coqui engine)
This will install all the necessary dependencies, including a **CPU support only** version of PyTorch (needed for Coqui engine)

To use Coqui engine it is recommended to upgrade torch to GPU usage (see installation steps under [Coqui Engine](#coquiengine) further below).
Installation into virtual environment with GPU support:

```bash
python -m venv env_realtimetts
env_realtimetts\Scripts\activate.bat
python.exe -m pip install --upgrade pip
pip install RealtimeTTS
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
```

More information about [CUDA installation](#cuda-installation).

## Engine Requirements

Expand Down Expand Up @@ -96,72 +107,7 @@ Downloads a neural TTS model first. In most cases it be fast enought for Realtim
- to clone a voice submit the filename of a wave file containing the source voice as cloning_reference_wav to the CoquiEngine constructor
- in my experience voice cloning works best with a 24000 Hz mono 16bit WAV file containing a short (~10 sec) sample

#### GPU-Support (CUDA) for Coqui

Additional steps are needed for a **GPU-optimized** installation. These steps are recommended for those who require **better performance** and have a compatible NVIDIA GPU.

> **Note**: *To check if your NVIDIA GPU supports CUDA, visit the [official CUDA GPUs list](https://developer.nvidia.com/cuda-gpus).*

To use local Coqui Engine with GPU support via CUDA please follow these steps:

1. **Install NVIDIA CUDA Toolkit 11.8**:
- Visit [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-11-8-0-download-archive).
- Select version 11.
- Download and install the software.

2. **Install NVIDIA cuDNN 8.7.0 for CUDA 11.x**:
- Visit [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive).
- Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
- Download and install the software.

3. **Install ffmpeg**:

You can download an installer for your OS from the [ffmpeg Website](https://ffmpeg.org/download.html).

Or use a package manager:

- **On Ubuntu or Debian**:
```bash
sudo apt update && sudo apt install ffmpeg
```

- **On Arch Linux**:
```bash
sudo pacman -S ffmpeg
```

- **On MacOS using Homebrew** ([https://brew.sh/](https://brew.sh/)):
```bash
brew install ffmpeg
```

- **On Windows using Chocolatey** ([https://chocolatey.org/](https://chocolatey.org/)):
```bash
choco install ffmpeg
```

- **On Windows using Scoop** ([https://scoop.sh/](https://scoop.sh/)):
```bash
scoop install ffmpeg
```

4. **Install PyTorch with CUDA support**:
```bash
pip install torch==2.1.0+cu118 torchaudio==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118
```

5. **Fix for to resolve compatility issues**:
If you run into library compatility issues, please set these libraries to fixed versions:

```bash
pip install networkx==2.8.8
pip install typing_extensions==4.8.0
pip install fsspec==2023.6.0
pip install imageio==2.31.6
pip install networkx==2.8.8
pip install numpy==1.24.3
pip install requests==2.31.0
```
On most systems GPU support will be needed to run fast enough for realtime, otherwise you will experience stuttering.

## Quick Start

Expand Down Expand Up @@ -388,29 +334,114 @@ These methods are responsible for executing the text-to-audio synthesis and play

By understanding and setting these parameters and methods appropriately, you can tailor the `TextToAudioStream` to meet the specific needs of your application.


### CUDA installation

These steps are recommended for those who require **better performance** and have a compatible NVIDIA GPU.

> **Note**: *to check if your NVIDIA GPU supports CUDA, visit the [official CUDA GPUs list](https://developer.nvidia.com/cuda-gpus).*

To use torch with support via CUDA please follow these steps:

> **Note**: *newer pytorch installations [may](https://stackoverflow.com/a/77069523) (unverified) not need Toolkit (and possibly cuDNN) installation anymore.*

1. **Install NVIDIA CUDA Toolkit**:
For example, to install Toolkit 11.8 please
- Visit [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-11-8-0-download-archive).
- Select version 11.
- Download and install the software.

2. **Install NVIDIA cuDNN**:
For example, to install cuDNN 8.7.0 for CUDA 11.x please
- Visit [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive).
- Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
- Download and install the software.

3. **Install ffmpeg**:

You can download an installer for your OS from the [ffmpeg Website](https://ffmpeg.org/download.html).

Or use a package manager:

- **On Ubuntu or Debian**:
```bash
sudo apt update && sudo apt install ffmpeg
```

- **On Arch Linux**:
```bash
sudo pacman -S ffmpeg
```

- **On MacOS using Homebrew** ([https://brew.sh/](https://brew.sh/)):
```bash
brew install ffmpeg
```

- **On Windows using Chocolatey** ([https://chocolatey.org/](https://chocolatey.org/)):
```bash
choco install ffmpeg
```

- **On Windows using Scoop** ([https://scoop.sh/](https://scoop.sh/)):
```bash
scoop install ffmpeg
```

4. **Install PyTorch with CUDA support**:
```bash
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
```

5. **Fix for to resolve compatility issues**:
If you run into library compatility issues, try setting these libraries to fixed versions:

```bash
pip install networkx==2.8.8
pip install typing_extensions==4.8.0
pip install fsspec==2023.6.0
pip install imageio==2.31.6
pip install networkx==2.8.8
pip install numpy==1.24.3
pip install requests==2.31.0
```

## 💖 Acknowledgements

Huge shoutout to the team behind [Coqui AI](https://coqui.ai/) being the first giving us local high quality synthesis with realtime speed and even a clonable voice!

## Contribution

Contributions are always welcome (e.g. PR to add a new engine).

## License

While the source of this library is under MIT, some libraries it depends on are not.
A lot of external engine providers currently DO NOT ALLOW commercial use together with their free plans.
Please read and respect the licenses of the different engine providers.

[CoquiEngine](https://coqui.ai/cpml)
- non-commercial for free plan, commercial paid plans available

[ElevenlabsEngine](https://help.elevenlabs.io/hc/en-us/articles/13313564601361-Can-I-publish-the-content-I-generate-on-the-platform-)
- non-commercial for free plan, commercial for every paid plan

[AzureEngine](https://learn.microsoft.com/en-us/answers/questions/1192398/can-i-use-azure-text-to-speech-for-commercial-usag)
- non-commercial for free tier, commercial for standard tier upwards

SystemEngine:
- GNU Lesser General Public License (LGPL) version 3.0
- commercial use allowed
## License Information

### ❗ Important Note:
While the source of this library is under the MIT License, many of the engines it depends on are not. External engine providers often restrict commercial use in their free plans. This means the engines can be used for noncommercial projects, but commercial usage requires a paid plan.

### Engine Licenses Summary:

#### CoquiEngine
- **License**: Open-source only for noncommercial projects.
- **Commercial Use**: Requires a paid plan.
- **Details**: [CoquiEngine License](https://coqui.ai/cpml)

#### ElevenlabsEngine
- **License**: Open-source only for noncommercial projects.
- **Commercial Use**: Available with every paid plan.
- **Details**: [ElevenlabsEngine License](https://help.elevenlabs.io/hc/en-us/articles/13313564601361-Can-I-publish-the-content-I-generate-on-the-platform-)

#### AzureEngine
- **License**: Open-source only for noncommercial projects.
- **Commercial Use**: Available from the standard tier upwards.
- **Details**: [AzureEngine License](https://learn.microsoft.com/en-us/answers/questions/1192398/can-i-use-azure-text-to-speech-for-commercial-usag)

#### SystemEngine
- **License**: Mozilla Public License 2.0 and GNU Lesser General Public License (LGPL) version 3.0.
- **Commercial Use**: Allowed under this license.
- **Details**: [SystemEngine License](https://github.com/nateshmbhat/pyttsx3/blob/master/LICENSE)

**Disclaimer**: This is a summarization of the licenses as understood at the time of writing. It is not legal advice. Please read and respect the licenses of the different engine providers yourself if you plan to use them in a project.

## Author

Expand Down
14 changes: 8 additions & 6 deletions RealtimeTTS/engines/coqui_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ def postprocess_wave(chunk):

if command == 'shutdown':
logging.info('Shutdown command received. Exiting worker process.')
conn.send({'status': 'shutdown'})
conn.send(('shutdown', 'shutdown'))
break # This exits the loop, effectively stopping the worker process.

elif command == 'synthesize':
Expand Down Expand Up @@ -333,7 +333,7 @@ def postprocess_wave(chunk):

except KeyboardInterrupt:
logging.info('Keyboard interrupt received. Exiting worker process.')
conn.send({'shutdown': 'shutdown'})
conn.send(('shutdown', 'shutdown'))

except Exception as e:
logging.error(f"General synthesis error: {e} occured trying to synthesize text {text}")
Expand Down Expand Up @@ -447,6 +447,9 @@ def synthesize(self,

while not 'finished' in status:
if 'shutdown' in status or 'error' in status:
if 'error' in status:
logging.error(f'Error synthesizing text: {text}')
logging.error(f'Error: {result}')
return False
self.queue.put(result)
status, result = self.parent_synthesize_pipe.recv()
Expand Down Expand Up @@ -550,8 +553,8 @@ def shutdown(self):

# Wait for the worker process to acknowledge the shutdown
try:
response = self.parent_synthesize_pipe.recv()
if response.get('status') == 'shutdown':
status, _ = self.parent_synthesize_pipe.recv()
if 'shutdown' in status:
logging.info('Worker process acknowledged shutdown')
except EOFError:
# Pipe was closed, meaning the process is already down
Expand All @@ -566,5 +569,4 @@ def shutdown(self):

# Wait for the process to terminate
self.synthesize_process.join()
logging.info('Worker process has been terminated')

logging.info('Worker process has been terminated')
2 changes: 2 additions & 0 deletions RealtimeTTS/stream_player.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,8 @@ def stop(self, immediate: bool = False):

if immediate:
self.immediate_stop.set()
while self.playback_active:
time.sleep(0.1)
return

self.playback_active = False
Expand Down
Loading

0 comments on commit b185026

Please sign in to comment.