Huginn Hears is a Python application designed to transcribe speech and summarize it in Norwegian. It is meant to be used locally and ran on a single machine. The application is built using the Streamlit framework for the user interface, Faster-Whisper for speech-to-text transcription, llmlingua-2 for compressing the transcribed text and llama-ccp-python for summarization. The main goal is to allow useres with little technical knowledge to test and try out STOA models locally on their computer. Taking advatage of the amazing open source projects out there and bundel it all into a simple installer.
- Transcribes speech into text.
- Summarizes the transcribed text.
- Supports both English and Norwegian languages.
Demo.HuginnHears.mp4
This project uses Poetry for dependency management. Before installing the project dependencies, it's essential to set up certain environment variables required by llama-ccp-python.
llama-cpp-python
requires specific environment variables to be set up in your system to function correctly. Follow the instructions in their repo to get the correct variables for your system. https://github.com/abetlen/llama-cpp-python
Examples
# Linux and Mac
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
# Windows
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
Soulutions from llama-cpp-python
Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'
If you run into issues where it complains it can't find 'nmake'
'?'
or CMAKE_C_COMPILER, you can extract w64devkit as mentioned in llama.cpp repo and add those manually to CMAKE_ARGS before running pip
install:
$env:CMAKE_GENERATOR = "MinGW Makefiles"
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"
See the above instructions and set CMAKE_ARGS
to the BLAS backend you want to use.
Detailed MacOS Metal GPU install documentation is available at docs/install/macos.md
M1 Mac Performance Issue
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`
Try installing with
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
After setting up the required environment variables, you can proceed to install the project. Ensure you have Poetry installed on your system. Then, run the following command in the project root directory:
poetry install
This will install all the necessary dependencies as defined in the pyproject.toml
file.
To run the application, use the following command:
streamlit run streamlit_app/app.py
This will start the Streamlit server and the application will be accessible at localhost:8501
.
To build the project into an executable, use the setup.py
script with cx_Freeze:
NB: Make sure you installed llama-cpp-python
with static linking.
python setup.py build
This will create an executable in the build
directory.
This project build on a lot of great work done by others. The following projects were used:
- Faster-Whisper for speech-to-text transcription.
- llmlingua-2 for prompt compression.
- llama-ccp-python to run LLMs locally and on CPUs.
- cx_Freeze for building executables.
- Langchain for controlling the prompt-response flows.
- Streamlit for building the UI.
Big thanks to all the contributors to these open-source projects!
In addition, the following models were used:
- Nasjonalbiblioteket AI Lab NB-Whisper.
- Microsoft LLMLingua-2.
- TheBloke for all sorts quantisized models.
You can read more about these models in these papers:
-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
-
Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges
This project is licensed under the Appache 2.0 License. See the LICENSE
file for more information.