Vision-Based Agent

Important

This repository is meant to be used for educational purposes only!

This repository contains a demo of an automatic LLM/VLM-based agent which visits Google News (specifically, the Technology > AI page), scrolls through it and look for interesting articles. It then clicks on the links it selected, and extracts the full article as plain text from the opened page. It then returns to Google News, scrolls down and continues this routine.

The agent uses only small local models for this:

Quantized llama-3.2-vision (11b, via Ollama)
Quantized llama-3.1 (3b, via Ollama)
Florence-2-base

Installation

Install Ollama and pull the required models:

ollama pull llama-3.1
ollama pull llama-3.2-vision

Create a new virtual environment (recommended) and clone this repo to it.
From the root repo of this repo, install using:

pip install -e .
playwright install

Running

Run the agent using the vba command:

usage: vba [-h] [-s SCROLLS] [-o OUTPUT_FILE] [--debug]

options:
  -h, --help            show this help message and exit
  -s SCROLLS, --scrolls SCROLLS
                        Number of mouse-scrolls to perform (non-negative integer, default = 1)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output filename (defaults to `output_[run-time].json`)
  --debug               Turn on debug mode

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vbavatar		vbavatar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
robot.png		robot.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Based Agent

Installation

Running

About

Languages

License

shakedzy/vbavatar

Folders and files

Latest commit

History

Repository files navigation

Vision-Based Agent

Installation

Running

About

Topics

Resources

License

Stars

Watchers

Forks

Languages