Skip to content

Batch Process LLM Text Completions Using a Data Frame

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

dylanpieper/batchLLM

Repository files navigation

batchLLM

CRAN statusCRAN downloadsR-CMD-checkGitHub stars

Batch process large language model (LLM) text completions by looping across the rows of a data frame column. This package is designed to optimize text processing tasks by utilizing data frames and column rows as the input and a new column with the text completions as the output.

🚀 Features

  • Supports multiple LLMs: OpenAI's GPT, Anthropic's Claude, and Google's Gemini
  • Automatic logging of batches and metadata
  • Side-by-side comparison of outputs from different LLMs
  • User-friendly Shiny App Addin
  • Resumable batch processing
  • Flexible configuration options

📦 Installation

Production (CRAN):

install.packages("batchLLM")

Development (GitHub):

install.packages("devtools")
devtools::install_github("dylanpieper/batchLLM")

🛠️ Usage

library(batchLLM)

# Set up your API keys
Sys.setenv(OPENAI_API_KEY = "your_openai_api_key")
Sys.setenv(ANTHROPIC_API_KEY = "your_anthropic_api_key")
Sys.setenv(GEMINI_API_KEY = "your_gemini_api_key")

# Configure LLMs
llm_configs <- list(
  list(LLM = "openai", model = "gpt-4o-mini"),
  list(LLM = "anthropic", model = "claude-3-haiku-20240307"),
  list(LLM = "google", model = "1.5-flash")
)

# Process data
beliefs <- lapply(llm_configs, function(config) {
  batchLLM(
    df = beliefs,
    col = statement,
    prompt = "classify as a fact or misinformation in one word",
    LLM = config$LLM,
    model = config$model,
    max_tokens = 100,
    batch_size = 10,
    batch_delay = "1min",
    case_convert = "lower",
    sanitize = TRUE
  )
})[[length(llm_configs)]]

print(beliefs)

🤖 Supported LLMs

LLM Models
OpenAI gpt-4, gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic claude-3-5-sonnet-20240620, claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307, claude-2.1, claude-2.0
Google 1.5-pro, 1.5-flash, 1.0-pro

🧰 Additional Tools

  • scrape_metadata(): Retrieve metadata from processed batches
  • get_batches(): Subset generated output from processed batches
  • batchLLM_shiny(): Shiny Addin for interactive use within RStudio IDE

🌟 Use Cases

  • Sentiment analysis
  • Thematic analysis
  • Classification
  • Labeling or tagging
  • Language translation
  • Refactoring variables

⚠️ Considerations

  • Be aware of your API rate limits
  • Check model accessibility with your API key

🤝 Contributing

Contributions are welcome! Here are some features ideas:

  • Function to analyze agreement between models

📄 License

This project is licensed under the MIT License.

👨‍💻 Developer's Note

My work on a complex classification problem inspired me to create this tool. I was challenged with categorizing thousands of unique offense descriptions in court data, and later, I tested the functionality to classify drug metabolites to their drug categories in toxicology data. The original function evolved significantly, and today, it powers this Shiny app designed to streamline and scale the use of LLMs across various datasets. I hope this tool proves as valuable to you as it has in my own projects.

🔗 Links

About

Batch Process LLM Text Completions Using a Data Frame

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages