Skip to content

Releases: google/litmus

v0.1.0

18 Sep 07:33
d6afcd9
Compare
Choose a tag to compare

Litmus 0.1.0

Introducing Litmus: A Comprehensive Testing and Evaluation Tool for LLM-Powered Applications

We're excited to announce the release of Litmus 0.1.0, a powerful and versatile tool designed to streamline the testing and evaluation of Large Language Models (LLMs). Litmus helps GenAI developers build robust and reliable LLM applications by providing a framework for automated testing, detailed result analysis, and AI-powered evaluation.

Key Features:

  • Flexible Test Templates: Define and manage test templates to specify the structure and parameters of your tests, enabling customization and reusability for both "Test Runs" and "Test Missions."
  • Automated Test Execution: Submit test runs using templates and provide test data, automating the execution process and freeing you from manual effort.
  • User-Friendly Web Interface: Interact with the Litmus platform through an intuitive and visually appealing web interface, simplifying test creation, submission, and analysis.
  • Detailed Results: View the status, progress, and detailed results of your test runs, including requests, responses, and LLM assessments.
  • Advanced Filtering: Filter responses from test runs based on specific JSON paths, focusing your analysis on specific aspects of the LLM's output.
  • Performance Monitoring: Track the performance of your LLM responses and identify areas for improvement by using AI-powered evaluation.
  • LLM Evaluation with Customizable Prompts: Leverage LLMs to compare actual responses with expected (golden) responses, utilizing customizable prompts to tailor the evaluation to your specific needs.
  • Proxy Service for Enhanced LLM Monitoring: Optionally deploy a proxy service to capture and analyze LLM interactions in greater detail, gaining insights into usage patterns, debugging issues, and optimizing performance.
  • Cloud Integration: Leverage the power of Google Cloud Platform (Firestore, Cloud Run, BigQuery) for efficient data storage, execution, and analysis, benefiting from the scalability and reliability of cloud services.
  • Quick Deployment: Deploy Litmus using the provided CLI tool (litmus deploy) for a streamlined setup.

Use Cases:

Litmus is ideal for:

  • Evaluating chatbot and dialogue system performance: Assess how naturally and effectively your AI interacts in a conversational setting using "Test Missions."
  • Testing task-oriented agents: Verify if your AI can successfully complete tasks like booking appointments, ordering food, or providing customer support.
  • Evaluating multi-step reasoning and problem-solving abilities: Assess your AI's capacity to break down complex goals into manageable steps and execute them through its interactions.
  • Unit testing individual model functionalities: Verify the accuracy and consistency of specific model capabilities, such as question answering, text summarization, or translation using "Test Runs."
  • Regression testing after model updates: Ensure that changes to your model haven't introduced unintended consequences or broken existing functionalities.
  • Benchmarking model performance against different datasets: Compare your model's performance on various test sets to identify strengths and weaknesses.

Getting Started:

To start using Litmus, follow our comprehensive Getting Started Guide.

Documentation:

Explore our detailed documentation covering various aspects of Litmus:

We're committed to continuously improving Litmus and welcome your contributions!

Join us in building a better future for GenAI development with Litmus!