Skip to content

Vodalus v1.0.0 - AI-Powered Dataset Generation and Annotation Tool

Latest
Compare
Choose a tag to compare
@severian42 severian42 released this 10 Jul 15:26
· 6 commits to main since this release
9ee50a3

Vodalus v1.0.0 - AI-Powered Dataset Generation and Annotation Tool

I am excited to announce the release of Vodalus v1.0.0, a comprehensive toolkit for AI-assisted dataset generation, annotation, and management.

What's New

  • Intuitive Gradio UI: A user-friendly interface for dataset manipulation, configuration, and generation.
  • AI-Powered Dataset Generation: Leverage local or remote LLMs to create synthetic datasets based on Wikipedia content.
  • Advanced Annotation Tools: Customizable quality scales, tag categories, and free-text fields for precise data labeling.
  • Flexible Dataset Editing: Seamlessly navigate, edit, and preview dataset entries with built-in JSON/Markdown conversion.
  • AI Assistant Integration: Chat with an AI helper for annotation guidance and quality checking.
  • Configurable Workflow: Easily modify system messages, prompts, and topics for tailored dataset generation.

Key Features

  1. Dataset Editor:

    • Load and navigate JSONL datasets
    • Convert between JSON and Markdown formats
    • Annotate entries with quality ratings, tags, and notes
  2. Annotation Configuration:

    • Customize quality scales and tag categories
    • Define free-text annotation fields
  3. Dataset Configuration:

    • Edit system messages, prompts, and topics
    • Save and load configurations
  4. Dataset Generation:

    • Generate entries using local LLMs
    • Configure workers and generation parameters
  5. AI Assistant:

    • Get real-time help with annotation and quality control

Getting Started

  1. Clone the repository
  2. Install dependencies: pip install -r requirements.txt
  3. Launch the Gradio UI: python app.py

For detailed instructions, please refer to the README.md file.

Feedback and Contributions

We welcome your feedback and contributions! Please open an issue or submit a pull request if you have any suggestions or improvements.

What's Next

We're already working on exciting new features for the next release, including:

  • Integration with more LLM providers
  • Enhanced data visualization tools
  • Automated quality control pipelines

Stay tuned for updates, and happy dataset building!