Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
add translation and requirements info
  • Loading branch information
dynamiccreator authored Oct 4, 2024
1 parent ead1c18 commit 2723e75
Showing 1 changed file with 6 additions and 14 deletions.
20 changes: 6 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,15 @@ This script reads any text file using voice cloning. It automatically splits the

Additionally you can use a LLM via API to translate the text into a different language and read that translation instead. This all works in realtime, with a small lead time at the beginning on a 1050 GTX with just 4GB VRAM (It uses xtts-v2, and 4GB vram only works if you have closed anything else, so I recommend at least 6GB VRAM to be on the safe side)

For the translation I'm using https://huggingface.co/mradermacher/Llama-3.2-3B-Instruct-uncensored-GGUF as it is a fast and suitable model giving up to 20 Tokens/s on a AMD 7950x cpu using llama.cpp. To make the translation work, I use the Dolphin prompt. Some models refuse to translate or return a wrong form. In that case the translation is repeated until the output contains text between the tags "<translation>" and "</translation>".
You can also use chatgpt or any other service as long you provide the correct adress and api key.

# Installation

Make sure all required python packages are installed:

import os
import torch
from TTS.api import TTS #(coqui tts)
from pydub import AudioSegment
import simpleaudio as sa
import nltk
import threading
from queue import Queue
import string
import random
import openai
import re
import argparse
```
pip install requirements.txt
```

For real time usage you will need a NVIDIA GPU, at least 1050 GTX or better. So you must install cuda on your device.

Expand Down

0 comments on commit 2723e75

Please sign in to comment.