- Raspberry Pi: For obvious reasons.
- USB Microphone: For user input.
- Speaker: For TARS output.
- Open a terminal on your Raspberry Pi.
- Clone the TARS-AI repository:
git clone https://github.com/pyrater/TARS-AI.git
- Navigate to the cloned directory:
cd TARS-AI
These dependencies are required for various operations, including Selenium-based automation, audio processing, and format handling.
-
Update Your System: Ensure your package lists and installed software are up to date:
sudo apt update sudo apt upgrade -y
-
Install Chromium: Chromium is the browser required for Selenium-based web automation:
sudo apt install -y chromium-browser
-
Install Chromedriver for Selenium: Chromedriver allows Selenium to control Chromium:
sudo apt install -y chromium-chromedriver
-
Install SoX and Format Support Libraries: SoX is a command-line tool for processing audio files.
sudo apt install -y sox libsox-fmt-all
-
Install PortAudio Development Libraries: PortAudio is a cross-platform audio input/output library.
sudo apt install -y portaudio19-dev
-
Verify Installations: Confirm that the installed packages are functioning:
- Check Chromium version:
chromium-browser --version
- Check Chromedriver version:
chromedriver --version
- Check SoX version:
sox --version
- Check Chromium version:
- Create a virtual environment:
python3 -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install the required dependencies under
src/
:pip install -r requirements.txt
- Connect your microphone to the Raspberry Pi via USB.
- Connect your speaker to the Raspberry Pi using the audio output or Bluetooth.
Create a .env
file at the root of your repository based on the pre-existing .env.template file to store your API keys for your LLM and TTS service.
.env
Template:
Add the following lines to your .env
file. Replace your-actual-api-key
with your actual API key for the desired service:
# LLM
OPENAI_API_KEY="your-actual-openai-api-key"
OOBA_API_KEY="your-actual-ooba-api-key"
TABBY_API_KEY="your-actual-tabby-api-key"
# TTS
AZURE_API_KEY="your-actual-azure-api-key"
- Set up an OpenAI API Key (very small cost) - OpenAI API Key
- Set up an Azure Speech API Key (FREE) - Azure Speech API Key
- Make sure to create a Free Azure account Free Azure Signup
- Follow all the steps in the video up to
Install Azure speech Python package
.
-
Create a
config.ini
file in thesrc/
folder based on the pre-existing config.ini.template file. -
Locate the
[LLM]
section and update the parameters (for OpenAI):[LLM] # Large Language Model configuration (ooba/OAI or tabby) llm_backend = openai # Set this to `openai` if using OpenAI models. base_url = https://api.openai.com # The URL for the OpenAI API. openai_model = gpt-4o-mini # Specify the OpenAI model to use (e.g., gpt-4o-mini or another supported model).
-
Locate the
[TTS]
section and update the parameters:[TTS] # Text-to-Speech configuration ttsoption = azure # TTS backend option: [azure, local, xttsv2, TARS] azure_region = eastus # Azure region for Azure TTS (e.g., eastus) ... tts_voice = en-US-Steffan:DragonHDLatestNeural # Name of the cloned voice to use (e.g., TARS2)
tts_voice
: You can find other voices available with Azure here.- If en-US-Steffan:DragonHDLatestNeural gives you an error, try en-US-SteffanNeural.
- Navigate to the
src/
folder within the repository:cd src/
- Start the application:
python app.py
- The program should now be running and ready to interact using your microphone and speaker.
The TTS server must run on your GPU-enabled PC due to its computational requirements.
- Ensure Python 3.9-3.12 is installed on your PC.
- Install CUDA and cuDNN compatible with your NVIDIA GPU - CUDA Installation
- Install PyTorch compatible with your CUDA and cuDNN versions - PyTorch Installation
Run the following command on your GPU-enabled PC to clone the XTTS API Server repository:
git clone https://github.com/daswer123/xtts-api-server.git
Follow the installation guide for your operating system:
- Windows:
- Create and activate a virtual environment:
python -m venv venv venv\Scripts\activate
- Install xtts-api-server:
pip install xtts-api-server
- Create and activate a virtual environment:
For more details, refer to the official XTTS API Server Installation Guide.
- Download the
TARS-short.wav
andTARS-long.wav
files from theTARS-AI
repository undersrc/tts/wakewords /VoiceClones
. These will be the different voices you can use for TARS. - Place it in the
speakers/
directory within the XTTS project folder. If the directory does not exist, create it.
- Open a terminal in the
xtts-api-server
project directory. - Activate your virtual environment if not already active:
- Start the XTTS API Server:
python -m xtts_api_server --listen --port 8020
- Once the server is running, open a browser and navigate to:
http://localhost:8020/docs
- This will open the API's Swagger documentation interface, which you can use to test the server and its endpoints.
- Locate the GET /speakers endpoint in the API documentation.
- Click "Try it out" and then "Execute" to test the endpoint.
- Ensure the response includes the
TARS-Short
andTARS-Long
speaker files, with entries similar to:[ { "name": "TARS-Long", "voice_id": "TARS-Long", "preview_url": "http://localhost:8020/sample/TARS-Long.wav" }, { "name": "TARS-Short", "voice_id": "TARS-Short", "preview_url": "http://localhost:8020/sample/TARS-Short.wav" } ]
- Locate the POST /tts_to_audio endpoint in the API documentation.
- Click "Try it out" and input the following JSON in the Request Body:
{ "text": "Hello, this is TARS speaking.", "speaker_wav": "TARS-Short", "language": "en" }
- Click "Execute" to send the request.
- Check the response for a generated audio file. You should see a download field where you can download and listen to the audio output.