Skip to content

Vision-based avatar, reads Google News and extracts news by itself using only local models

License

Notifications You must be signed in to change notification settings

shakedzy/vbavatar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision-Based Agent

robot

Important

This repository is meant to be used for educational purposes only!

This repository contains a demo of an automatic LLM/VLM-based agent which visits Google News (specifically, the Technology > AI page), scrolls through it and look for interesting articles. It then clicks on the links it selected, and extracts the full article as plain text from the opened page. It then returns to Google News, scrolls down and continues this routine.

The agent uses only small local models for this:

  • Quantized llama-3.2-vision (11b, via Ollama)
  • Quantized llama-3.1 (3b, via Ollama)
  • Florence-2-base

Installation

  1. Install Ollama and pull the required models:
ollama pull llama-3.1
ollama pull llama-3.2-vision
  1. Create a new virtual environment (recommended) and clone this repo to it.
  2. From the root repo of this repo, install using:
pip install -e .
playwright install

Running

Run the agent using the vba command:

usage: vba [-h] [-s SCROLLS] [-o OUTPUT_FILE] [--debug]

options:
  -h, --help            show this help message and exit
  -s SCROLLS, --scrolls SCROLLS
                        Number of mouse-scrolls to perform (non-negative integer, default = 1)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output filename (defaults to `output_[run-time].json`)
  --debug               Turn on debug mode

About

Vision-based avatar, reads Google News and extracts news by itself using only local models

Topics

Resources

License

Stars

Watchers

Forks

Languages