Skip to content

Yet another self-hosted AI voice assistant. GlaDOS' blazing fast pipeline with a more realistic Kokoro TTS voice and vision and Metahuman integration.

License

Notifications You must be signed in to change notification settings

SuperMaximus1984/KokoDOS_MH

 
 

Repository files navigation

KokoDOS: AI voice assistant with vision and real-time webcam chat

Webcam chat demo (experimental):

test-video-chat-kokodos.mp4

Vision:

KokoDOS-v.mp4

KokoDOS is a project that transforms the excellent GlaDOS project into a more traditional AI voice assistant. By leveraging Kokoro-FastAPI, KokoDOS provides a realistic, real-time voice and video interaction experience. Additionally, you can share a section of your screen with the assistant and have a conversation about it.

How to use vision

You can ask the AI to summarize an article you are reading, describe a picture, translate, you name it. Press and hold V on your keyboard and move the overlay to a section of your screen that you want to share with the AI. Use mouse wheel to resize the overlay. Release V and ask your question!

Vision was tested on Windows only. Feel free to open issues if you run into problems on Linux or macOS. You need scrot to use vision on Linux.

How to configure LLM and TTS voice

Open kokodos_config.yml in the main directory and edit:

  1. model to set up your LLM, for example: model: "minicpm-v:latest"

  2. tts_voice to set up your Kokoro TTS voice, for example: tts_voice: "af_bella"

    • Available voices are:
      • af_bella
      • af_irulan
      • af_nicole
      • af_sarah
      • af_sky
      • am_adam
      • am_michael
      • bf_emma
      • bf_isabella
      • bm_george
      • bm_lewis

    You can combine voices to further customize your AI, for example: af_bella+af_nicole.

Installation

Steps are mostly the same as for GlaDOS. Before cloning this repo, install the following:

  1. minicpm-v:latest LLM via Ollama (Make sure that the LLM of your choice supports vision if you want to use it. If you don't care about vision, you can use any LLM.)
  2. Kokoro FastAPI using Docker

Windows Installation Process

  1. Open the Microsoft Store, search for python and install Python 3.12
  2. Download this repository, either:
    1. Download and unzip this repository somewhere in your home folder, or
    2. If you have Git set up, git clone this repository using https://github.com/kaminoer/KokoDOS.git
  3. In the repository folder, run the install_windows.bat, and wait until the installation in complete.
  4. Double click start_windows.bat to start KokoDOS!

macOS Installation Process

This is still experimental. Any issues can be addressed in the Discord server. If you create an issue related to this, you will be referred to the Discord server. Note: I was getting Segfaults! Please leave feedback!

  1. Download this repository, either:

    1. Download and unzip this repository somewhere in your home folder, or
    2. In a terminal, git clone this repository using https://github.com/kaminoer/KokoDOS.git
  2. In a terminal, go to the repository folder and run these commands:

      chmod +x install_mac.command
      chmod +x start_mac.command
    
  3. In the Finder, double click install_mac.command, and wait until the installation in complete.

  4. Double click start_mac.command to start KokoDOS!

Linux Installation Process

This is still experimental. Any issues can be addressed in the Discord server. If you create an issue related to this, you will be referred to the Discord server. This has been tested on Ubuntu 24.04.1 LTS

  1. Install the PortAudio library, if you don't yet have it installed:

      sudo apt update
      sudo apt install libportaudio2
    
  2. Download this repository, either:

    1. Download and unzip this repository somewhere in your home folder, or
    2. In a terminal, git clone this repository using https://github.com/kaminoer/KokoDOS.git
  3. In a terminal, go to the repository folder and run these commands:

      chmod +x install_ubuntu.sh
      chmod +x start_ubuntu.sh
    
  4. In the a terminal in the GLaODS folder, run ./install_ubuntu.sh, and wait until the installation in complete.

  5. Run ./start_ubuntu.sh to start KokoDOS!

Some caveats and plans for the future

  • At some point I'll rename the files, configs, and functions to reflect the name of this project (KokoDOS).
  • This project is using Kokoro's phonemization pipeline and TTS. There is still a lot of code cleanup to be done.
  • At least 12GB of VRAM is required to run KokoDOS smoothly and have a real-time conversation. Use a smaller LLM if you don't have enough VRAM.
  • The plan is to add some useful features to the voice assistant such as clipboard access, web access, possibly a vision LLM.

Below is the original GlaDOS readme.

dnhkng%2FGlaDOS | Trendshift

GLaDOS Personality Core

This is a project dedicated to building a real-life version of GLaDOS!

NEW: If you want to chat or join the community, Join our discord! If you want to support, sponsor the project here!

LocalGLaDOS.mp4

Update 3-1-2025 Got GLaDOS running on an 8Gb SBC!

glados_update.mov

This is really tricky, so only for hardcore geeks! Checkout the 'rock5b' branch, and my OpenAI API for the RK3588 NPU system Don't expect support for this, it's in active development, and requires lots of messing about in armbian linux etc.

Goals

This is a hardware and software project that will create an aware, interactive, and embodied GLaDOS.

This will entail:

  • Train GLaDOS voice generator
  • Generate a prompt that leads to a realistic "Personality Core"
  • Generate a medium- and long-term memory for GLaDOS (Probably a custom vector DB in a simpy Numpy array!)
  • Give GLaDOS vision via a VLM (either a full VLM for everything, or a 'vision module' using a tiny VLM the GLaDOS can function call!)
  • Create 3D-printable parts
  • Design the animatronics system

Software Architecture

The initial goals are to develop a low-latency platform, where GLaDOS can respond to voice interactions within 600ms.

To do this, the system constantly records data to a circular buffer, waiting for voice to be detected. When it's determined that the voice has stopped (including detection of normal pauses), it will be transcribed quickly. This is then passed to streaming local Large Language Model, where the streamed text is broken by sentence, and passed to a text-to-speech system. This means further sentences can be generated while the current is playing, reducing latency substantially.

Subgoals

  • The other aim of the project is to minimize dependencies, so this can run on constrained hardware. That means no PyTorch or other large packages.
  • As I want to fully understand the system, I have removed a large amount of redirection: which means extracting and rewriting code.

Hardware System

This will be based on servo- and stepper-motors. 3D printable STL will be provided to create GlaDOS's body, and she will be given a set of animations to express herself. The vision system will allow her to track and turn toward people and things of interest.

Installation Instruction

Try this simplified process, but be aware it's still in the experimental stage! For all operating systems, you'll first need to install Ollama to run the LLM.

Install Drivers in necessary

If you are an Nvidia system with CUDA, make sure you install the necessary drivers and CUDA, info here: https://onnxruntime.ai/docs/install/

If you are using another accelerator (ROCm, DirectML etc.), after following the instructions below for you platform, follow up with installing the best onnxruntime version for your system.

Set up a local LLM server:

  1. Download and install Ollama for your operating system.
  2. Once installed, download a small 2B model for testing, at a terminal or command prompt use: ollama pull llama3.2

Note: You can use any OpenAI or Ollama compatible server, local or cloud based. Just edit the glados_config.yaml and update the completion_url, model and the api_key if necessary.

Windows Installation Process

  1. Open the Microsoft Store, search for python and install Python 3.12
  2. Download this repository, either:
    1. Download and unzip this repository somewhere in your home folder, or
    2. If you have Git set up, git clone this repository using git clone github.com/dnhkng/glados.git
  3. In the repository folder, run the install_windows.bat, and wait until the installation in complete.
  4. Double click start_windows.bat to start GLaDOS!

macOS Installation Process

This is still experimental. Any issues can be addressed in the Discord server. If you create an issue related to this, you will be referred to the Discord server. Note: I was getting Segfaults! Please leave feedback!

  1. Download this repository, either:

    1. Download and unzip this repository somewhere in your home folder, or
    2. In a terminal, git clone this repository using git clone github.com/dnhkng/glados.git
  2. In a terminal, go to the repository folder and run these commands:

      chmod +x install_mac.command
      chmod +x start_mac.command
    
  3. In the Finder, double click install_mac.command, and wait until the installation in complete.

  4. Double click start_mac.command to start GLaDOS!

Linux Installation Process

This is still experimental. Any issues can be addressed in the Discord server. If you create an issue related to this, you will be referred to the Discord server. This has been tested on Ubuntu 24.04.1 LTS

  1. Install the PortAudio library, if you don't yet have it installed:

      sudo apt update
      sudo apt install libportaudio2
    
  2. Download this repository, either:

    1. Download and unzip this repository somewhere in your home folder, or
    2. In a terminal, git clone this repository using git clone github.com/dnhkng/glados.git
  3. In a terminal, go to the repository folder and run these commands:

      chmod +x install_ubuntu.sh
      chmod +x start_ubuntu.sh
    
  4. In the a terminal in the GLaODS folder, run ./install_ubuntu.sh, and wait until the installation in complete.

  5. Run ./start_ubuntu.sh to start GLaDOS!

Changing the LLM Model

To use other models, use the command: ollama pull {modelname} and then add {modelname} to glados_config.yaml as the model. You can find more models here!

Common Issues

  1. If you find you are getting stuck in loops, as GLaDOS is hearing herself speak, you have two options:
    1. Solve this by upgrading your hardware. You need to you either headphone, so she can't physically hear herself speak, or a conference-style room microphone/speaker. These have hardware sound cancellation, and prevent these loops.
    2. Disable voice interruption. This means neither you nor GLaDOS can interrupt when GLaDOS is speaking. To accomplish this, edit the glados_config.yaml, and change interruptible: to false.
  2. If you want to the the Text UI, you should use the glados-ui.py file instead of glado.py

Testing the submodules

You can test the systems by exploring the 'demo.ipynb'.

Star History

Star History Chart

About

Yet another self-hosted AI voice assistant. GlaDOS' blazing fast pipeline with a more realistic Kokoro TTS voice and vision and Metahuman integration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.2%
  • Shell 7.4%
  • Jupyter Notebook 6.1%
  • Batchfile 3.5%
  • Dockerfile 1.8%