Skip to content

Latest commit

 

History

History
166 lines (121 loc) · 4.27 KB

README.md

File metadata and controls

166 lines (121 loc) · 4.27 KB

Llama Image Captioner

An elegant Python application that generates detailed image captions using Meta's Llama 3.2 90B Vision model through the OpenRouter API.

Features

  • 🖼️ Simple drag-and-drop image upload interface
  • 🔄 Choose between short and detailed captions
  • 🤖 Powered by Meta-Llama 3.2 90B Vision Instruct model
  • 🌐 Easy-to-use Gradio web interface
  • ⚡ Fast and accurate image analysis

Prerequisites

  • Python 3.7 or higher
  • OpenRouter API key (get it from OpenRouter)
  • Internet connection

Installation

1. Clone the Repository

git clone https://github.com/PierrunoYT/llama-image-captioner.git
cd llama-image-captioner

2. Set Up Python Environment

Choose your operating system:

Windows

# Create a virtual environment
python -m venv venv

# Activate the environment
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

macOS/Linux

# Create a virtual environment
python3 -m venv venv

# Activate the environment
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure API Keys

You can set up your environment variables in two ways:

Option 1: Using .env file (Recommended)

  1. Copy the example environment file:
    cp .env.example .env
  2. Edit the .env file and replace the values with your actual configuration:
    OPENROUTER_API_KEY=your_api_key_here
    YOUR_SITE_URL=http://localhost:7860
    YOUR_APP_NAME=Llama Image Captioner
    

Option 2: Using System Environment Variables

Windows (Command Prompt)

setx OPENROUTER_API_KEY "your_api_key_here"
setx YOUR_SITE_URL "http://localhost:7860"
setx YOUR_APP_NAME "Llama Image Captioner"

Windows (PowerShell)

[System.Environment]::SetEnvironmentVariable('OPENROUTER_API_KEY', 'your_api_key_here', 'User')
[System.Environment]::SetEnvironmentVariable('YOUR_SITE_URL', 'http://localhost:7860', 'User')
[System.Environment]::SetEnvironmentVariable('YOUR_APP_NAME', 'Llama Image Captioner', 'User')

macOS/Linux

Add these lines to your ~/.bashrc, ~/.zshrc, or equivalent:

export OPENROUTER_API_KEY="your_api_key_here"
export YOUR_SITE_URL="http://localhost:7860"
export YOUR_APP_NAME="Llama Image Captioner"

Then reload your shell configuration:

source ~/.bashrc  # or source ~/.zshrc

Running the Application

  1. Make sure your virtual environment is activated:

    • Windows: venv\Scripts\activate
    • macOS/Linux: source venv/bin/activate
  2. Start the application:

python ImageCaption.py
  1. Open your web browser and navigate to:

Usage

  1. Upload an image using one of these methods:

    • Drag and drop an image into the upload area
    • Click the upload area to select an image from your files
    • Paste an image from your clipboard
  2. Select caption length:

    • Short: Brief, concise description
    • Long: Detailed analysis of the image
  3. Click "Submit" and wait for the caption to be generated

Troubleshooting

Common Issues

  1. API Key Error

    • Ensure you've set the environment variables correctly
    • Restart your terminal/command prompt after setting environment variables
    • Check if your API key is valid
  2. Import Errors

    • Verify that your virtual environment is activated
    • Reinstall dependencies: pip install -r requirements.txt
  3. Connection Issues

    • Check your internet connection
    • Verify that OpenRouter API is accessible from your network

Contributing

  1. Fork the repository
  2. Create your feature branch: git checkout -b feature/AmazingFeature
  3. Commit your changes: git commit -m 'Add some AmazingFeature'
  4. Push to the branch: git push origin feature/AmazingFeature
  5. Open a Pull Request

License

License: MIT

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 PierrunoYT

Links