Skip to content

causeri3/marvin-the-paranoid-android

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Marvin the Paranoid Robot

Description

From Video input to audio output. Via object detection - (yolov8, onnx format), LLM - (chatGPT, via API) and text-to-speech (fastspeech2-en-ljspeech). One can use webcam, movie files or youtube videos as input. Compatible with Mac and Windows and properly Linux.

github_sub_low.mov

Dependencies

Python

python==3.9

GPU

If you can leverage your GPU by having all CUDA dependencies installed, you can substitute onnxruntime with onnxrunntime-gpu in requirements.txt

Got it running with:

Python packages

You can install them via pip install -r requirements.txt

Usage

You need an OpenAI Token to get it running

  • webcam: python yolo-chat-tts/main.py -ok <your key>
  • local video: python yolo-chat-tts/main.py -ok <your key> -vp "path/to/your/video.mov"
  • youtube: python yolo-chat-tts/main.py -ok <your key> -y "https://www.youtube.com/watch?v=uhkdUdXTUuc"

Args

See all arguments : python yolo-chat-tts/main.py --help

You can

  • choose between multiple camera devices
  • pick the interval between the cynical comments
  • choose whether the object detection is in your video or just in the logs
  • choose a threshold for confidence
  • choose a threshold for IoU
  • choose the model size

Thanks Tien Luong Ngoc & Ibai Gorordo, I took a bunch of useful code from your linked repositories

Releases

No releases published

Packages

No packages published

Languages