possible future tts backends #183

danielw97 · 2024-01-08T19:51:29Z

danielw97
Jan 8, 2024

Hi Aedocw et al,
With the unfortunate news of coqui shutting down I thought it might be an idea to discuss possible future tts implimentations, of course depending on aedocw's time and the community's submissions and willingness to continue supporting this great tool.
Personally, I hope that the coqui team will choose to leave the great resources they've given us available which will hopefully mean that this tool can continue to work with these libraries.
However as we all know the world of tts and machine learning moves quickly.
Ideally there will be a possibility for community contributions to those repositories, and who knows what the future holds as far as building on the work the Coqui team has done.
The main open source tool I've come across that is somewhat comparable to xtts as far as quality is StyleTTS2 which now has an api, although hopefully it is still being refined by the community and original researchers.
For something with more synthetic quality although quite fast there is piper (https://github.com/rhasspy/piper).
This is still under active development as it is used in home assistant etc I believe.
I don't see a tun of discussion online about tts systems such as this, so thought I would make a post here which I hope is acceptable as it directly affects the possible future of the work that has been done.

aedocw · 2024-01-08T20:29:12Z

aedocw
Jan 8, 2024
Maintainer

This is a great discussion to kick off, thanks for starting it!

One thing to keep in mind - with Coqui as a business shutting down, it just means they will not be actively developing or supporting Coqui TTS. The code isn't going anywhere, it will just eventually stop being updated. There's also a good chance it gets picked up by a community fork and has another life, though there may be some license considerations around that. For me, I have no concerns of the license since it only matters if you're going to try to use it commercially. I will not be trying to sell anything with epub2tts so I'm safe :)

At some point where the XTTS and VITS models are hosted might go away. If that happens, for this project I can just host them somewhere and adjust the code to pull the model from there if the user does not already have it.

I have been checking on StyleTTS2 occasionally and think it has a lot of promise. I haven't looked at it in a little bit but I'm fully in favor of adding it as an option with epub2tts when it is stable and the quality is "good enough" (I have no idea what good enough means in the context though, so if someone else thinks it's good enough before I do and does the work to bring it it I'll be happy to support that).

Ultimately I think epub2tts will need to become more modular, to make it easy for folks to choose which engine they want to include. Otherwise the requirements and installation process is going to get bigger and bigger. For instance using something like piper requires fewer resources, and if someone only ever wants to use piper it would be nice if their install footprint did not include Coqui TTS.

0 replies

aedocw · 2024-01-11T03:32:24Z

aedocw
Jan 11, 2024
Maintainer

Has anyone spent any time working with RVC? I hear about it all the time in the context of speaker cloning but I can't say I've ever seen good examples to run locally (everything I see has been something like "go run this on hf". Wondering if anyone knows how applicable it would be here, or has pointers to their favorite run-local example.

1 reply

danielw97 Jan 11, 2024
Author

Hi,
I'm not too terribly familiar with it either, although I believe it's mostly used in the context of creating ai music particularly where synthesizing vocals with a specific timbre or quality is concerned.
I might be off here, although in the main repo I see lots of references to uvr5 which performs music separation as well as other models it needs.
I do however see finetuning mentioned so maybe that is just the predominate use we're seeing currently and it has more general tts usage possibilities as well.

aedocw · 2024-02-15T14:31:24Z

aedocw
Feb 15, 2024
Maintainer

This project has some really good sounding samples, and is under very active development:
https://github.com/myshell-ai/OpenVoice
https://research.myshell.ai/open-voice

When I get some spare time I'll try it out and see if it is worth pursuing. In particular with XTTSv2 I have been noticing repeats much more frequently than I would like, even with repeat penalty turned up pretty high. It would be interesting to see if OpenVoice does not suffer from that for long text.

One other nice thing: OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker.

0 replies

aedocw · 2024-02-22T22:44:25Z

aedocw
Feb 22, 2024
Maintainer

Create an audiobook in huggingface using StyleTTS 2 - https://github.com/duplaja/epub-to-audiobook-hf

0 replies

aedocw · 2024-02-27T15:15:42Z

aedocw
Feb 27, 2024
Maintainer

Another used pointed out this project, looks like it produces really great audiobooks with multiple voices for different speakers, and has multiple TTS backends including StyleTTS2.

https://github.com/DrewThomasson/VoxNovel

0 replies

aedocw · 2024-03-20T03:38:11Z

aedocw
Mar 20, 2024
Maintainer

Adding Microsoft Edge TTS per this issue: #220

0 replies

danielw97 · 2024-05-23T13:34:23Z

danielw97
May 23, 2024
Author

For anyone who may be interested, I've managed to get epub2tts working with localai's tts backend (https://localai.io/features/text-to-audio)
It uses the openai api, and I only needed to specify the OPENAI_BASE_URL environment variable to point to my localai instance and specify the engine as openai.
It uses piper by default, however currently I'm having to manually define the model in my epub2tts like this: "en-gb-alan-low.onnx"
I'm doing some more testing, although currently it generates extremely quickly on linux with decent quality.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible future tts backends #183

{{title}}

Replies: 7 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

possible future tts backends #183

danielw97 Jan 8, 2024

Replies: 7 comments · 1 reply

aedocw Jan 8, 2024 Maintainer

aedocw Jan 11, 2024 Maintainer

danielw97 Jan 11, 2024 Author

aedocw Feb 15, 2024 Maintainer

aedocw Feb 22, 2024 Maintainer

aedocw Feb 27, 2024 Maintainer

aedocw Mar 20, 2024 Maintainer

danielw97 May 23, 2024 Author

danielw97
Jan 8, 2024

Replies: 7 comments 1 reply

aedocw
Jan 8, 2024
Maintainer

aedocw
Jan 11, 2024
Maintainer

danielw97 Jan 11, 2024
Author

aedocw
Feb 15, 2024
Maintainer

aedocw
Feb 22, 2024
Maintainer

aedocw
Feb 27, 2024
Maintainer

aedocw
Mar 20, 2024
Maintainer

danielw97
May 23, 2024
Author