Batch process large language model (LLM) text completions by looping across the rows of a data frame column. This package is designed to optimize text processing tasks by utilizing data frames and column rows as the input and a new column with the text completions as the output.
- Supports multiple LLMs: OpenAI's GPT, Anthropic's Claude, and Google's Gemini
- Automatic logging of batches and metadata
- Side-by-side comparison of outputs from different LLMs
- User-friendly Shiny App Addin
- Resumable batch processing
- Flexible configuration options
Production (CRAN):
install.packages("batchLLM")
Development (GitHub):
install.packages("devtools")
devtools::install_github("dylanpieper/batchLLM")
library(batchLLM)
# Set up your API keys
Sys.setenv(OPENAI_API_KEY = "your_openai_api_key")
Sys.setenv(ANTHROPIC_API_KEY = "your_anthropic_api_key")
Sys.setenv(GEMINI_API_KEY = "your_gemini_api_key")
# Configure LLMs
llm_configs <- list(
list(LLM = "openai", model = "gpt-4o-mini"),
list(LLM = "anthropic", model = "claude-3-haiku-20240307"),
list(LLM = "google", model = "1.5-flash")
)
# Process data
beliefs <- lapply(llm_configs, function(config) {
batchLLM(
df = beliefs,
col = statement,
prompt = "classify as a fact or misinformation in one word",
LLM = config$LLM,
model = config$model,
max_tokens = 100,
batch_size = 10,
batch_delay = "1min",
case_convert = "lower",
sanitize = TRUE
)
})[[length(llm_configs)]]
print(beliefs)
LLM | Models |
---|---|
OpenAI | gpt-4, gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo |
Anthropic | claude-3-5-sonnet-20240620, claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307, claude-2.1, claude-2.0 |
1.5-pro, 1.5-flash, 1.0-pro |
scrape_metadata()
: Retrieve metadata from processed batchesget_batches()
: Subset generated output from processed batchesbatchLLM_shiny()
: Shiny Addin for interactive use within RStudio IDE
- Sentiment analysis
- Thematic analysis
- Classification
- Labeling or tagging
- Language translation
- Refactoring variables
- Be aware of your API rate limits
- Check model accessibility with your API key
Contributions are welcome! Here are some features ideas:
- Function to analyze agreement between models
This project is licensed under the MIT License.
My work on a complex classification problem inspired me to create this tool. I was challenged with categorizing thousands of unique offense descriptions in court data, and later, I tested the functionality to classify drug metabolites to their drug categories in toxicology data. The original function evolved significantly, and today, it powers this Shiny app designed to streamline and scale the use of LLMs across various datasets. I hope this tool proves as valuable to you as it has in my own projects.