Skip to content

v0.3.0

Compare
Choose a tag to compare
@KoljaB KoljaB released this 29 Nov 17:15
· 327 commits to master since this release
e7fe147

More languages

For example chinese supported now.

Technical background:
v0.3.0 uses a new stream2sentence version which now implements stanza tokenizer besides nltk. This allows sentence splitting for way more languages than the nltk tokenizer, which is specialized for english and supports few other languages. Downsides of stanza: download of a quite big model model is necessary, it consumes VRAM (~2GB min) and it is not the fastest even when running on GPU. But I think it is a very performant model. If anybody knows a more light-weight model or library that does the sentence tokenizing well, let me know.

To use the new tokenizer please call the ´play´ or ´play_async´ methods with the new parameter ´tokenizer="stanza"´ and provide the language shortcut. Also adjust ´minimum_sentence_length´, ´minimum_first_fragment_length´ and ´context_size´ parameters to the average word length of the desired language.

For example (chinese):

self.stream.play_async(
    minimum_sentence_length = 2,
    minimum_first_fragment_length = 2, 
    tokenizer="stanza", 
    language="zh",
    context_size=2)

Example implementations here and here.

Fallback engines

Fallback now supported for azure, coqui and system engine (elevenlabs coming soon), enhancing reliability for real-time scenarios by switching to alternate engines if one fails

To use the fallback mechanism just submit a list of engines to the TextToAudioStream constructor instead of a single engine. In case the synthesis of the first engine in the list throws an exception or gives a result hinting to a not successful synthesis the next engine in the list will be tried.

For example:

engines = [AzureEngine(azure_speech_key, azure_speech_region),
               coqui_engine,
               system_engine]
stream = TextToAudioStream(engines)

Example implementation here.

Audio file saving feature

Usage via the output_wavfile parameter of ´play´ and ´play_async´ methods. This allows for the simultaneous saving of real-time synthesized audio, enabling later playback of the live synthesis.

For example:

filename = "synthesis_" + engine.engine_name
stream.play(output_wavfile = f"{filename}.wav")

Also compare to usage here.