Query regarding using CoquiXTTS for voice cloning on my Mac #43

Madhuvod · 2024-10-16T17:53:47Z

Madhuvod
Oct 16, 2024

Hey drew, I am currently working on a project and it includes Voice Cloning in it just like you. I'm finding it very difficult to run coquiXTTS on my device as I am gpu poor - m2 Mac air 8gb ram :/ I was wondering what alternatives I have or if there are any small fine tuned versions of coqui built on MLX so that I can run them. And I have a few doubts regrading the TTS model, thought its best to ask you as you've been working on a similar project (found you on hf) - pls lmk where I can text you so that you could help me with this. Thanks in advance ! :)

DrewThomasson · 2024-10-16T18:10:14Z

DrewThomasson
Oct 16, 2024
Maintainer

You could try styletts2 which is faster but that only works for English.

https://github.com/sidharthrajaram/StyleTTS2

And you still only get inference through cpu speeds and no metal speed up

0 replies

DrewThomasson · 2024-10-16T18:12:11Z

DrewThomasson
Oct 16, 2024
Maintainer

I mean Xtts should be able to run on your computer tho

Get minicona

Install pip install tts in a python 3.10 env

And it should just work

Yes it will be slow tho

At the moment, the only text to speech services that run locally have zero metal speed up support so if you're running it on something like an m1 or m2 you will get only CPU infrence so it's gonna be slow but it'll work I guess.

0 replies

DrewThomasson · 2024-10-16T18:13:10Z

DrewThomasson
Oct 16, 2024
Maintainer

Xtts only needs 4 gb ram to run tho

I can verify that given that I've run it on virtual machines given only 4gb of cpu ram just fine

0 replies

DrewThomasson · 2024-10-16T18:15:47Z

DrewThomasson
Oct 16, 2024
Maintainer

But yeah

Welcome to the world of text-to-speech on Mac

Anything that works will be super slow

And some don't even work at all and you have to run them in docker

Like piper-tts for instance

It's crazy fast for multilingual Siri like voices with no voice cloning

BUT PIPER DOES NOT RUN NATIVELY ON M1 YOU HAVE TO USE A X86 DOCKER ENV OR SOMETHING TO RUN IT

0 replies

DrewThomasson · 2024-10-16T18:19:09Z

DrewThomasson
Oct 16, 2024
Maintainer

I suppose you could like

Run a crappy text-to speech without voice cloning and then run the coqui-voice conversion on all the generated output files to do a voice conversion.

This would give you technically fast voice cloning

Like this huggingface space does

https://huggingface.co/spaces/drewThomasson/Voice-Conversion

4 replies

Madhuvod Oct 16, 2024
Author

I suppose you could like

Run a crappy text-to speech without voice cloning and then run the coqui-voice conversion on all the generated output files to do a voice conversion.

This would give you technically fast voice cloning

Like this huggingface space does

https://huggingface.co/spaces/drewThomasson/Voice-Conversion

yea I thought about it too I was initially doing this. can you expand on coqui-voice conversion on all the generated output files? is there one? I mean for me to do that, I'll have to clone the whole repo and run it locally no? (which is impossible I've tried) since there's no api for coqui anymore

Madhuvod Oct 16, 2024
Author

I suppose you could like
Run a crappy text-to speech without voice cloning and then run the coqui-voice conversion on all the generated output files to do a voice conversion.
This would give you technically fast voice cloning
Like this huggingface space does
https://huggingface.co/spaces/drewThomasson/Voice-Conversion

yea I thought about it too I was initially doing this. can you expand on coqui-voice conversion on all the generated output files? is there one? I mean for me to do that, I'll have to clone the whole repo and run it locally no? (which is impossible I've tried) since there's no api for coqui anymore

this hugging face space works great! similar to the project im working on. can you tell me which model did you use and how did you do it?

DrewThomasson Oct 16, 2024
Maintainer

It's built into the coqui tts

pip install tts

And then just do something like

import torch
from TTS.api import TTS

# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"

# List available 🐸TTS models
print(TTS().list_models())

tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to(device)
tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")

It's all listed in the coqui tts readme anyway lol

https://github.com/coqui-ai/TTS

DrewThomasson Oct 16, 2024
Maintainer

Coqui tts also has a bunch of tiny fast tts models built in that arnt xtts just look at the output of the list models commands lol the othered are generally fast

DrewThomasson · 2024-10-16T18:24:40Z

DrewThomasson
Oct 16, 2024
Maintainer

Or just make it run in a google colab for free GPU

:/

0 replies

Madhuvod · 2024-10-16T18:54:35Z

Madhuvod
Oct 16, 2024
Author

so here's my project - Voxlingua. The user gets to upload a YouTube video link and select a target language - he gets back the video with translated audio along with original speaker's voice (voice cloning). this is something you've done im sure. So I have divided this into 6 parts (python files)- video processor, speech_recoginiton, text_translator, text_to_speech, voice_cloning.py and audio_video_sync.py. and the voice cloning part is where im stuck at. I have the translated audio (with gTTS) and translated transcription too (with MarainMT), now I need to clone the voice from original audio and get a voice cloned translated audio. For this, I have tried CoquiXTTS, OpenVoice and also F5TTS(which released recently and it's great but it only supports English, Chinese). It's very hard for me to use these locally on my Mac. Can you pls help me out

0 replies

Madhuvod · 2024-10-16T18:59:13Z

Madhuvod
Oct 16, 2024
Author

Also I have a doubt (im new to the ml/tts space dont judge) so since multiple users upload videos, the coqui or whatever model (for voice cloning)has to run on the backend right (inference), I basically need inference power too right? not like I train this pre trained model once on gpu and its just works continuously without any gpu. what can I do in this particular situation - given that I am gpu poor

5 replies

Madhuvod Oct 17, 2024
Author

BROO @DrewThomasson pls explain this query! will I be needing inference power too? when a user uses my model, it takes in computation of his cpu/gpu or mine too?

DrewThomasson Oct 17, 2024
Maintainer

For running or training machine learning models

The models can usually run on Nvidia GPU or cpu whichever you have available

If it runs on nvidia GPU then you need enough Video ram

If it runs on cpu then you need enough regular ram

DrewThomasson Oct 17, 2024
Maintainer

For needed power to run that's classified as the requirements for infrence

For xtts for example you need like 4 gb ram or 4 gb video ram

For TRAINING IS DIFFERENT TRAINING/FINE-TUNING is much more resource intensive

Fine tuning xtts for instance takes 12 gb ram or 12 gb VRAM

DrewThomasson Oct 17, 2024
Maintainer

You do not need a GPU to do any of this all these models enought run on cpu if no GPU

They will just run very slowly on CPU compared to GPU because GPU cannot do parallel processing like a GPU can

DrewThomasson Oct 17, 2024
Maintainer

If you pre-train a model like xtts or whatnot then for instance it'll take:

12 gb vram or ram to train

And then to infrence that trained/fine-tuned model you will only need the 4 the regular model uses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query regarding using CoquiXTTS for voice cloning on my Mac #43

{{title}}

Replies: 8 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Query regarding using CoquiXTTS for voice cloning on my Mac #43

Madhuvod Oct 16, 2024

Replies: 8 comments · 9 replies

DrewThomasson Oct 16, 2024 Maintainer

DrewThomasson Oct 16, 2024 Maintainer

DrewThomasson Oct 16, 2024 Maintainer

DrewThomasson Oct 16, 2024 Maintainer

BUT PIPER DOES NOT RUN NATIVELY ON M1 YOU HAVE TO USE A X86 DOCKER ENV OR SOMETHING TO RUN IT

DrewThomasson Oct 16, 2024 Maintainer

Madhuvod Oct 16, 2024 Author

Madhuvod Oct 16, 2024 Author

DrewThomasson Oct 16, 2024 Maintainer

DrewThomasson Oct 16, 2024 Maintainer

DrewThomasson Oct 16, 2024 Maintainer

Madhuvod Oct 16, 2024 Author

Madhuvod Oct 16, 2024 Author

Madhuvod Oct 17, 2024 Author

DrewThomasson Oct 17, 2024 Maintainer

DrewThomasson Oct 17, 2024 Maintainer

DrewThomasson Oct 17, 2024 Maintainer

DrewThomasson Oct 17, 2024 Maintainer

Madhuvod
Oct 16, 2024

Replies: 8 comments 9 replies

DrewThomasson
Oct 16, 2024
Maintainer

DrewThomasson
Oct 16, 2024
Maintainer

DrewThomasson
Oct 16, 2024
Maintainer

DrewThomasson
Oct 16, 2024
Maintainer

DrewThomasson
Oct 16, 2024
Maintainer

Madhuvod Oct 16, 2024
Author

Madhuvod Oct 16, 2024
Author

DrewThomasson Oct 16, 2024
Maintainer

DrewThomasson Oct 16, 2024
Maintainer

DrewThomasson
Oct 16, 2024
Maintainer

Madhuvod
Oct 16, 2024
Author

Madhuvod
Oct 16, 2024
Author

Madhuvod Oct 17, 2024
Author

DrewThomasson Oct 17, 2024
Maintainer

DrewThomasson Oct 17, 2024
Maintainer

DrewThomasson Oct 17, 2024
Maintainer

DrewThomasson Oct 17, 2024
Maintainer