Full Changelog: https://github.com/MagnusS0/HuginnHears/commits/v0.1.0-alpha
First release of Huginn Hears
Transcribe audio and summarize text, all locally on your machine using SOTA models.
Huginn Hears is optimized for Norwegian using Nb-Whisper for transcribing, but can also handle English.
Features
- Audio Transcription: Utilizes the Nb-Whisper speech-to-text model within Faster-Whisper's CTranslate2 engine, offering high accuracy fine-tuned for Norwegian.
- Text Summarization: Supports any large language model compatible with Llama.cpp for flexible, efficient summarization.
- Prompt Compression: Features LLMLingua-2 for effective prompt compression, reducing summary generation time.
System Requirements
- OS: Windows 10/11
- Install from source for use on other systems.
- Disk Space: 10GB minimum free space required for installation and models. (Without the models it's around 1GB)
- RAM: 8 GB RAM is all you need for the lowest settings.
- GPU: Not supported in the installed version; for improved performance, consider installing from source and use one of the many llama.cpp supported hardware acceleration backends to speed up inference.
Quick Start Guide
1. Download and Install:
- Choose one of the install options for Huginn Hears and proceed with the download. (Simplest is the .msi installer)
- Important: Huginn Hears requires the Microsoft Visual C++ Redistributable Package (MSVC) for proper operation. Ensure that MSVC is installed on your computer:
- If you do not have MSVC installed, you can download it directly from Microsoft's official link.
- Alternatively, select the version of Huginn Hears that includes MSVC (labelled as "incl-msvc") from the provided download options.
2. Launch Huginn Hears from the location you installed it.
3. Select an audio file from a meeting to transcribe.
- During the first run, Huginn Hears will automatically download the necessary models (approx. 6-8GB) for transcription and summarization. This initial setup requires an internet connection, but once downloaded, Huginn Hears can operate entirely offline.
4. Choose to summarize the transcript directly or apply llmlingua-2 for prompt compression.
5. Receive a comprehensive summary of your meeting.
Disclaimer: Using this with just CPU is not going to be fast. For speed I recommend installing from source and use one of the hardware acceleration backends to speed up inference from llama.cpp.
Production Use
Currently, Huginn Hears excels as a proof of concept for individual users looking to enhance their productivity with local audio transcription and text summarization. The goal of this version was to make it run on a laptop.
However I've begun exploring with a server setup for model management, which will optimize performance for multiple users by eliminating the need for repeated model loading. This can also allow you to run the model on a machine with optimized hardware, while still allow for the security of locally running everything. This setup would allow teams to utilize a centralized, powerful server—running in their office—for the heavy lifting, while still maintaining the application's operation on their individual machines.
If you feel like you can contribute feel free to reach out.