Vodalus v1.0.0 - AI-Powered Dataset Generation and Annotation Tool
I am excited to announce the release of Vodalus v1.0.0, a comprehensive toolkit for AI-assisted dataset generation, annotation, and management.
What's New
- Intuitive Gradio UI: A user-friendly interface for dataset manipulation, configuration, and generation.
- AI-Powered Dataset Generation: Leverage local or remote LLMs to create synthetic datasets based on Wikipedia content.
- Advanced Annotation Tools: Customizable quality scales, tag categories, and free-text fields for precise data labeling.
- Flexible Dataset Editing: Seamlessly navigate, edit, and preview dataset entries with built-in JSON/Markdown conversion.
- AI Assistant Integration: Chat with an AI helper for annotation guidance and quality checking.
- Configurable Workflow: Easily modify system messages, prompts, and topics for tailored dataset generation.
Key Features
-
Dataset Editor:
- Load and navigate JSONL datasets
- Convert between JSON and Markdown formats
- Annotate entries with quality ratings, tags, and notes
-
Annotation Configuration:
- Customize quality scales and tag categories
- Define free-text annotation fields
-
Dataset Configuration:
- Edit system messages, prompts, and topics
- Save and load configurations
-
Dataset Generation:
- Generate entries using local LLMs
- Configure workers and generation parameters
-
AI Assistant:
- Get real-time help with annotation and quality control
Getting Started
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Launch the Gradio UI:
python app.py
For detailed instructions, please refer to the README.md file.
Feedback and Contributions
We welcome your feedback and contributions! Please open an issue or submit a pull request if you have any suggestions or improvements.
What's Next
We're already working on exciting new features for the next release, including:
- Integration with more LLM providers
- Enhanced data visualization tools
- Automated quality control pipelines
Stay tuned for updates, and happy dataset building!