Skip to content

An elegant Python application that generates detailed image captions using Meta's Llama 3.2 90B Vision model through the OpenRouter API.

License

Notifications You must be signed in to change notification settings

PierrunoYT/llama-image-captioner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Llama Image Captioner

An elegant Python application that generates detailed image captions using Meta's Llama 3.2 90B Vision model through the OpenRouter API.

Features

  • 🖼️ Simple drag-and-drop image upload interface
  • 🔄 Choose between short and detailed captions
  • 🤖 Powered by Meta-Llama 3.2 90B Vision Instruct model
  • 🌐 Easy-to-use Gradio web interface
  • ⚡ Fast and accurate image analysis

Prerequisites

  • Python 3.7 or higher
  • OpenRouter API key (get it from OpenRouter)
  • Internet connection

Installation

1. Clone the Repository

git clone https://github.com/PierrunoYT/llama-image-captioner.git
cd llama-image-captioner

2. Set Up Python Environment

Choose your operating system:

Windows

# Create a virtual environment
python -m venv venv

# Activate the environment
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

macOS/Linux

# Create a virtual environment
python3 -m venv venv

# Activate the environment
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure API Keys

You can set up your environment variables in two ways:

Option 1: Using .env file (Recommended)

  1. Copy the example environment file:
    cp .env.example .env
  2. Edit the .env file and replace the values with your actual configuration:
    OPENROUTER_API_KEY=your_api_key_here
    YOUR_SITE_URL=http://localhost:7860
    YOUR_APP_NAME=Llama Image Captioner
    

Option 2: Using System Environment Variables

Windows (Command Prompt)

setx OPENROUTER_API_KEY "your_api_key_here"
setx YOUR_SITE_URL "http://localhost:7860"
setx YOUR_APP_NAME "Llama Image Captioner"

Windows (PowerShell)

[System.Environment]::SetEnvironmentVariable('OPENROUTER_API_KEY', 'your_api_key_here', 'User')
[System.Environment]::SetEnvironmentVariable('YOUR_SITE_URL', 'http://localhost:7860', 'User')
[System.Environment]::SetEnvironmentVariable('YOUR_APP_NAME', 'Llama Image Captioner', 'User')

macOS/Linux

Add these lines to your ~/.bashrc, ~/.zshrc, or equivalent:

export OPENROUTER_API_KEY="your_api_key_here"
export YOUR_SITE_URL="http://localhost:7860"
export YOUR_APP_NAME="Llama Image Captioner"

Then reload your shell configuration:

source ~/.bashrc  # or source ~/.zshrc

Running the Application

  1. Make sure your virtual environment is activated:

    • Windows: venv\Scripts\activate
    • macOS/Linux: source venv/bin/activate
  2. Start the application:

python ImageCaption.py
  1. Open your web browser and navigate to:

Usage

  1. Upload an image using one of these methods:

    • Drag and drop an image into the upload area
    • Click the upload area to select an image from your files
    • Paste an image from your clipboard
  2. Select caption length:

    • Short: Brief, concise description
    • Long: Detailed analysis of the image
  3. Click "Submit" and wait for the caption to be generated

Troubleshooting

Common Issues

  1. API Key Error

    • Ensure you've set the environment variables correctly
    • Restart your terminal/command prompt after setting environment variables
    • Check if your API key is valid
  2. Import Errors

    • Verify that your virtual environment is activated
    • Reinstall dependencies: pip install -r requirements.txt
  3. Connection Issues

    • Check your internet connection
    • Verify that OpenRouter API is accessible from your network

Contributing

  1. Fork the repository
  2. Create your feature branch: git checkout -b feature/AmazingFeature
  3. Commit your changes: git commit -m 'Add some AmazingFeature'
  4. Push to the branch: git push origin feature/AmazingFeature
  5. Open a Pull Request

License

License: MIT

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 PierrunoYT

Links

About

An elegant Python application that generates detailed image captions using Meta's Llama 3.2 90B Vision model through the OpenRouter API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages