Skip to content

Latest commit

 

History

History
235 lines (197 loc) · 7.68 KB

ENVSETUP.md

File metadata and controls

235 lines (197 loc) · 7.68 KB

ENVSETUP.md

Minimum Hardware Prerequisites

  1. Raspberry Pi: For obvious reasons.
  2. USB Microphone: For user input.
  3. Speaker: For TARS output.

Environment Setup Guide (IN DEVELOPMENT)

1. Set Up the TARS-AI Repository on Raspberry Pi

A. Clone the Repository

  1. Open a terminal on your Raspberry Pi.
  2. Clone the TARS-AI repository:
    git clone https://github.com/pyrater/TARS-AI.git
  3. Navigate to the cloned directory:
    cd TARS-AI

B. Install System-Level Dependencies

These dependencies are required for various operations, including Selenium-based automation, audio processing, and format handling.

  1. Update Your System: Ensure your package lists and installed software are up to date:

    sudo apt update
    sudo apt upgrade -y
  2. Install Chromium: Chromium is the browser required for Selenium-based web automation:

    sudo apt install -y chromium-browser
  3. Install Chromedriver for Selenium: Chromedriver allows Selenium to control Chromium:

    sudo apt install -y chromium-chromedriver
  4. Install SoX and Format Support Libraries: SoX is a command-line tool for processing audio files.

    sudo apt install -y sox libsox-fmt-all
  5. Install PortAudio Development Libraries: PortAudio is a cross-platform audio input/output library.

    sudo apt install -y portaudio19-dev
  6. Verify Installations: Confirm that the installed packages are functioning:

    • Check Chromium version:
      chromium-browser --version
    • Check Chromedriver version:
      chromedriver --version
    • Check SoX version:
      sox --version

C. Set Up the Python Environment

  1. Create a virtual environment:
    python3 -m venv venv
  2. Activate the virtual environment:
    source venv/bin/activate
  3. Install the required dependencies under src/:
    pip install -r requirements.txt

D. Connect Hardware

  1. Connect your microphone to the Raspberry Pi via USB.
  2. Connect your speaker to the Raspberry Pi using the audio output or Bluetooth.

E. Set the API Key in a .env File (Recommended for Secure Key Management)

Create a .env file at the root of your repository based on the pre-existing .env.template file to store your API keys for your LLM and TTS service.

.env Template: Add the following lines to your .env file. Replace your-actual-api-key with your actual API key for the desired service:

# LLM
OPENAI_API_KEY="your-actual-openai-api-key"
OOBA_API_KEY="your-actual-ooba-api-key"
TABBY_API_KEY="your-actual-tabby-api-key"

# TTS
AZURE_API_KEY="your-actual-azure-api-key"
  • Set up an OpenAI API Key (very small cost) - OpenAI API Key
  • Set up an Azure Speech API Key (FREE) - Azure Speech API Key
    • Make sure to create a Free Azure account Free Azure Signup
    • Follow all the steps in the video up to Install Azure speech Python package.

F. Set the config.ini Parameters

  1. Create a config.ini file in the src/ folder based on the pre-existing config.ini.template file.

  2. Locate the [LLM] section and update the parameters (for OpenAI):

    [LLM] # Large Language Model configuration (ooba/OAI or tabby)
    llm_backend = openai
    # Set this to `openai` if using OpenAI models.
    base_url = https://api.openai.com
    # The URL for the OpenAI API.
    openai_model = gpt-4o-mini
    # Specify the OpenAI model to use (e.g., gpt-4o-mini or another supported model).
  3. Locate the [TTS] section and update the parameters:

    [TTS] # Text-to-Speech configuration 
    ttsoption = azure
    # TTS backend option: [azure, local, xttsv2, TARS]
    azure_region = eastus
    # Azure region for Azure TTS (e.g., eastus)
    ...
    tts_voice = en-US-Steffan:DragonHDLatestNeural
    # Name of the cloned voice to use (e.g., TARS2)
  • tts_voice: You can find other voices available with Azure here.
    • If en-US-Steffan:DragonHDLatestNeural gives you an error, try en-US-SteffanNeural.

G. Run the Program

  1. Navigate to the src/ folder within the repository:
    cd src/
  2. Start the application:
    python app.py
  3. The program should now be running and ready to interact using your microphone and speaker.

(OPTIONAL) Set up XTTS Server

1. Prepare Your PC with NVIDIA GPU

The TTS server must run on your GPU-enabled PC due to its computational requirements.

  1. Ensure Python 3.9-3.12 is installed on your PC.
  2. Install CUDA and cuDNN compatible with your NVIDIA GPU - CUDA Installation
  3. Install PyTorch compatible with your CUDA and cuDNN versions - PyTorch Installation

2. Set Up XTTS API Server

A. Install XTTS API Server

Run the following command on your GPU-enabled PC to clone the XTTS API Server repository:

git clone https://github.com/daswer123/xtts-api-server.git

Follow the installation guide for your operating system:

  • Windows:
    1. Create and activate a virtual environment:
      python -m venv venv
      venv\Scripts\activate
    2. Install xtts-api-server:
      pip install xtts-api-server

For more details, refer to the official XTTS API Server Installation Guide.

B. Add the TARS.wav Speaker File

  1. Download the TARS-short.wav and TARS-long.wav files from the TARS-AI repository under src/tts/wakewords /VoiceClones. These will be the different voices you can use for TARS.
  2. Place it in the speakers/ directory within the XTTS project folder. If the directory does not exist, create it.

C. Start the XTTS API Server

  1. Open a terminal in the xtts-api-server project directory.
  2. Activate your virtual environment if not already active:
  3. Start the XTTS API Server:
    python -m xtts_api_server --listen --port 8020
  4. Once the server is running, open a browser and navigate to:
    http://localhost:8020/docs
    
  5. This will open the API's Swagger documentation interface, which you can use to test the server and its endpoints.

D. Verify the Server

  1. Locate the GET /speakers endpoint in the API documentation.
  2. Click "Try it out" and then "Execute" to test the endpoint.
  3. Ensure the response includes the TARS-Short and TARS-Long speaker files, with entries similar to:
    [
      {
        "name": "TARS-Long",
        "voice_id": "TARS-Long",
        "preview_url": "http://localhost:8020/sample/TARS-Long.wav"
      },
      {
        "name": "TARS-Short",
        "voice_id": "TARS-Short",
        "preview_url": "http://localhost:8020/sample/TARS-Short.wav"
      }
    ]
  4. Locate the POST /tts_to_audio endpoint in the API documentation.
  5. Click "Try it out" and input the following JSON in the Request Body:
    {
        "text": "Hello, this is TARS speaking.",
        "speaker_wav": "TARS-Short",
        "language": "en"
    }
  6. Click "Execute" to send the request.
  7. Check the response for a generated audio file. You should see a download field where you can download and listen to the audio output.