Releases: google/litmus
Releases · google/litmus
v0.1.0
Litmus 0.1.0
Introducing Litmus: A Comprehensive Testing and Evaluation Tool for LLM-Powered Applications
We're excited to announce the release of Litmus 0.1.0, a powerful and versatile tool designed to streamline the testing and evaluation of Large Language Models (LLMs). Litmus helps GenAI developers build robust and reliable LLM applications by providing a framework for automated testing, detailed result analysis, and AI-powered evaluation.
Key Features:
- Flexible Test Templates: Define and manage test templates to specify the structure and parameters of your tests, enabling customization and reusability for both "Test Runs" and "Test Missions."
- Automated Test Execution: Submit test runs using templates and provide test data, automating the execution process and freeing you from manual effort.
- User-Friendly Web Interface: Interact with the Litmus platform through an intuitive and visually appealing web interface, simplifying test creation, submission, and analysis.
- Detailed Results: View the status, progress, and detailed results of your test runs, including requests, responses, and LLM assessments.
- Advanced Filtering: Filter responses from test runs based on specific JSON paths, focusing your analysis on specific aspects of the LLM's output.
- Performance Monitoring: Track the performance of your LLM responses and identify areas for improvement by using AI-powered evaluation.
- LLM Evaluation with Customizable Prompts: Leverage LLMs to compare actual responses with expected (golden) responses, utilizing customizable prompts to tailor the evaluation to your specific needs.
- Proxy Service for Enhanced LLM Monitoring: Optionally deploy a proxy service to capture and analyze LLM interactions in greater detail, gaining insights into usage patterns, debugging issues, and optimizing performance.
- Cloud Integration: Leverage the power of Google Cloud Platform (Firestore, Cloud Run, BigQuery) for efficient data storage, execution, and analysis, benefiting from the scalability and reliability of cloud services.
- Quick Deployment: Deploy Litmus using the provided CLI tool (
litmus deploy
) for a streamlined setup.
Use Cases:
Litmus is ideal for:
- Evaluating chatbot and dialogue system performance: Assess how naturally and effectively your AI interacts in a conversational setting using "Test Missions."
- Testing task-oriented agents: Verify if your AI can successfully complete tasks like booking appointments, ordering food, or providing customer support.
- Evaluating multi-step reasoning and problem-solving abilities: Assess your AI's capacity to break down complex goals into manageable steps and execute them through its interactions.
- Unit testing individual model functionalities: Verify the accuracy and consistency of specific model capabilities, such as question answering, text summarization, or translation using "Test Runs."
- Regression testing after model updates: Ensure that changes to your model haven't introduced unintended consequences or broken existing functionalities.
- Benchmarking model performance against different datasets: Compare your model's performance on various test sets to identify strengths and weaknesses.
Getting Started:
To start using Litmus, follow our comprehensive Getting Started Guide.
Documentation:
Explore our detailed documentation covering various aspects of Litmus:
- What is Litmus?
- Getting Started
- Manual Setup
- API Reference
- CLI Usage
- Proxy Usage
- Contribution Guide
- FAQ
- Known Issues
We're committed to continuously improving Litmus and welcome your contributions!
Join us in building a better future for GenAI development with Litmus!