Replies: 7 comments 1 reply
-
This is a great discussion to kick off, thanks for starting it! One thing to keep in mind - with Coqui as a business shutting down, it just means they will not be actively developing or supporting Coqui TTS. The code isn't going anywhere, it will just eventually stop being updated. There's also a good chance it gets picked up by a community fork and has another life, though there may be some license considerations around that. For me, I have no concerns of the license since it only matters if you're going to try to use it commercially. I will not be trying to sell anything with epub2tts so I'm safe :) At some point where the XTTS and VITS models are hosted might go away. If that happens, for this project I can just host them somewhere and adjust the code to pull the model from there if the user does not already have it. I have been checking on StyleTTS2 occasionally and think it has a lot of promise. I haven't looked at it in a little bit but I'm fully in favor of adding it as an option with epub2tts when it is stable and the quality is "good enough" (I have no idea what good enough means in the context though, so if someone else thinks it's good enough before I do and does the work to bring it it I'll be happy to support that). Ultimately I think epub2tts will need to become more modular, to make it easy for folks to choose which engine they want to include. Otherwise the requirements and installation process is going to get bigger and bigger. For instance using something like piper requires fewer resources, and if someone only ever wants to use piper it would be nice if their install footprint did not include Coqui TTS. |
Beta Was this translation helpful? Give feedback.
-
Has anyone spent any time working with RVC? I hear about it all the time in the context of speaker cloning but I can't say I've ever seen good examples to run locally (everything I see has been something like "go run this on hf". Wondering if anyone knows how applicable it would be here, or has pointers to their favorite run-local example. |
Beta Was this translation helpful? Give feedback.
-
This project has some really good sounding samples, and is under very active development: When I get some spare time I'll try it out and see if it is worth pursuing. In particular with XTTSv2 I have been noticing repeats much more frequently than I would like, even with repeat penalty turned up pretty high. It would be interesting to see if OpenVoice does not suffer from that for long text. One other nice thing: |
Beta Was this translation helpful? Give feedback.
-
Create an audiobook in huggingface using StyleTTS 2 - https://github.com/duplaja/epub-to-audiobook-hf |
Beta Was this translation helpful? Give feedback.
-
Another used pointed out this project, looks like it produces really great audiobooks with multiple voices for different speakers, and has multiple TTS backends including StyleTTS2. |
Beta Was this translation helpful? Give feedback.
-
Adding Microsoft Edge TTS per this issue: #220 |
Beta Was this translation helpful? Give feedback.
-
For anyone who may be interested, I've managed to get epub2tts working with localai's tts backend (https://localai.io/features/text-to-audio) |
Beta Was this translation helpful? Give feedback.
-
Hi Aedocw et al,
With the unfortunate news of coqui shutting down I thought it might be an idea to discuss possible future tts implimentations, of course depending on aedocw's time and the community's submissions and willingness to continue supporting this great tool.
Personally, I hope that the coqui team will choose to leave the great resources they've given us available which will hopefully mean that this tool can continue to work with these libraries.
However as we all know the world of tts and machine learning moves quickly.
Ideally there will be a possibility for community contributions to those repositories, and who knows what the future holds as far as building on the work the Coqui team has done.
The main open source tool I've come across that is somewhat comparable to xtts as far as quality is StyleTTS2 which now has an api, although hopefully it is still being refined by the community and original researchers.
For something with more synthetic quality although quite fast there is piper (https://github.com/rhasspy/piper).
This is still under active development as it is used in home assistant etc I believe.
I don't see a tun of discussion online about tts systems such as this, so thought I would make a post here which I hope is acceptable as it directly affects the possible future of the work that has been done.
Beta Was this translation helpful? Give feedback.
All reactions