- NVIDIA GPU: Required for optimal performance.
- Python Environment: Python 3.8 or newer installed.
- Internet Connection: To download the necessary model files.
Follow the instructions from github on the PC/SERVER where the TTS processing will happen.
- Clone REPO
- git clone https://github.com/daswer123/xtts-api-server
- cd xtts-api-server
- Create virtual env
- python -m venv venv
- venv/scripts/activate or source venv/bin/activate
- Install deps
- pip install -r requirements.txt
- pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
- Launch server to ensure it works
- python -m xtts_api_server
- After successfully starting the server close the server.
- Open a terminal or command prompt.
- Clone the XTTS API Server repository:
- Navigate to the cloned directory:
- cd xtts-api-server
- Install the required Python packages:
- pip install -r requirements.txt
- Download the ALL of the Model Files from Pyrater/TARS page on Hugging Face:
- config.json
- vocab.json
- model.pth
- etc...
- Organize the Files
- Create a directory named tars inside the XTTS models directory. For example:
- mkdir -p /xtts-api-server/xtts_models/tars
- Place the downloaded files into the tars directory.
- Place reference.wav in the speakers folder and rename it to TARS.wav
-
Start the server using the following command:
- python xtts_api_server --listen --deepspeed --lowvram --model-folder "D:/AI_Tools/xtts-api-server/xtts_models" --model-source local --version tars
- Replace the --model-folder path if your XTTS models directory is located elsewhere.
-
Test the TARS Voice Model
- With the server running, use the XTTS API server's interface or provided scripts to input text.
- Verify that the audio output emulates TARS's voice.
Additional Resources - XTTS API Server GitHub Repository - Local Voice Cloning Using XTTS API Server - Video Tutorial