Audio to Text Conversion

CPSC 440 - Computer Systems Architecture
Jerry Liu, Sean Del Castillo

Introduction

This program listens to a microphone device, writes to wav, then utilizes OpenAI Whisper to convert the audio to text.

OpenAI Whisper is an automated speech recognition (ASR) system that was trained using 680,000 hours of supervised web-based multilingual and multitask data. In using a dataset of size and variety increases resilience against accents, background noise, and technical terminology. Additionally, it permits both translation into English from several languages as well as transcription in those languages.

The Whisper module converted to C++ is included in this repo under the MIT license. The inference was written by Georgi Gerganov and the repo can be found here: https://github.com/ggerganov/whisper.cpp

Hardware Prerequisite

The hardware driver depends on the GPIOZero library (https://github.com/gpiozero/gpiozero). Hardware must support this library and have at least 11 GPIO pins.

Installation and Setup

To download all Python library modules: pip install -r requirements.txt

The sounddevice library needs PortAudio which isn't bundled in on Linux: sudo apt install libportaudio2

To install libav please see this page for general requrements and platform specific installation instructions: https://wiki.libav.org/Platform

You might also need to install ffmpeg which is a open-source command-line tool for transcoding multimedia files. sudo apt install ffmpeg

Make sure to create two subdirectories ./recordings, ./transcriptions to store file outputs.

Getting Started

Start driver.py python3 driver.py
Press and hold the recording_button to record
Release the button to start the transcription process
Recording .wavs are stored in /recordings and matching transcriptions are stored in /transcripts
Exit the program in listening state by KeyboardInterrupt

LED Legend

[ ] Unlit [*] Blinking [X] Lit
     [ ] <- Recording LED
     [ ] <- Caution LED
     [ ] <- Graph LEDs
     [ ] <-/ /
     [ ] <--/
     [ ] <- Power LED

LED Reporting

Listening

     [ ] <- Recording LED
     [ ] <- Caution LED
     [ ] <- Graph LEDs
     [ ] <-/ /
     [ ] <--/
     [X] <- Power LED

Recording

     [X] <- Recording LED
     [ ] <- Caution LED
     [ ] <- Graph LEDs
     [ ] <-/ /
     [ ] <--/
     [X] <- Power LED

Transcribing

     [*] <- Recording LED
     [*] <- Caution LED
     [ ] <- Graph LEDs
     [ ] <-/ /
     [ ] <--/
     [ ] <- Power LED

Graph LED states

     [ ] <- Recording LED
     [X] <- Caution LED
     [X] <- Graph LEDs
     [X] <-/ /
     [X] <--/
     [ ] <- Power LED

The three Graph LEDs are pulse wave modulated to 20 maximum values. They track how many recordings are in /recordings. If all three LEDs are fully lit and Caution is lit then that means the maximum amount of files are being tracked by the graph LEDs. Recording and transcribing are unaffected by this max value.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
whisper.cpp		whisper.cpp
.gitignore		.gitignore
IdeaBox-LICENSE		IdeaBox-LICENSE
Readme.MD		Readme.MD
driver.py		driver.py
recording.py		recording.py
requirements.txt		requirements.txt
whispercpp-LICENSE		whispercpp-LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Audio to Text Conversion

Introduction

Hardware Prerequisite

Installation and Setup

Getting Started

LED Legend

LED Reporting

References

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

jllewis11/Audio-Convert

Folders and files

Latest commit

History

Repository files navigation

Audio to Text Conversion

Introduction

Hardware Prerequisite

Installation and Setup

Getting Started

LED Legend

LED Reporting

References

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages