Easy Transcription

Easy Transcription is an easy-to-use interface to Deepgram's voice-to-text service, with Whisper model. The interface provides easy and high-accurcy access to transcribe audio files, with speaker diarisation enabled. You can also quickly mark different speakers and copy the finished transcript to anywhere you want.

Setup

Clone the project to your local achine:

git clone [email protected]:Techming/ez-transcription.git

Make sure that you have Python ^3.8 and pip installed. To check, run the following command:

$ python3 --version
Python 3.8.5 # or anything above 3.8.0

$ pip3 --version
pip 20.1.1 from /PATH/TO/YOUR/PYTHON/INTEPRETER/python3.8/site-packages/pip (python 3.8)

Go to Deepgram, register your account, create a new project and an API key.
Under project root, create a .env file and copy & paste your API key with the name DEEPGRAM_API_KEY
```
echo DEEPGRAM_API_KEY=<PUT-YOUR-API-KEY-HERE> > .env
```
For Mac users, simply navigate to the project root and run ./setup.sh
For Windows users, follow the steps below:
- First create a python virtual environment under the project room
```
python3 -m venv venv
```
- Then activate the virtual environment
```
source venv/bin/activate
```
- Finally, run pip install
```
pip install -r requirements.txt
```

Usage

Activate virtual environment under the project root
```
source venv/bin/activate
```
Run the command below
```
python main.py
```

A new browser window should pop up and that's it!

Options

main.py does accept several command line argument as following:

--reload
Reload the server when source files change
--host TEXT
Host of the app, default is 127.0.0.1
--port INTEGER
Port of the app, default is 8000
--help
Show this message and exit.

FAQ

It takes too long to process the audio file

The typical processing time for this API should be from 12x to 6x of the length of the audio file. For example, it can take from 20s to 40s for a 4 minute audio. Note that Whisper model in general takes more time to process the transcript. Other factors such as Deepgram traffic and quality of the audio can also contribute to the process time.

Speaker diarization isn't very accurate

This could happen is there's many people takling or there's overlap between two speakers. Unfortuantely, this is a limitation from Deepgram. You are welcome to fine-tune Deepgram's parameters under main.py

Authors

@techming

Feedback

If you have any feedback, please reach out by opening an issue.

Contributing

Contributions are always welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
frontend		frontend
public		public
service		service
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy Transcription

Setup

Usage

Options

FAQ

It takes too long to process the audio file

Speaker diarization isn't very accurate

Authors

Feedback

Contributing

About

Packages

Contributors 2

Languages

License

haoming29/ez-transcription

Folders and files

Latest commit

History

Repository files navigation

Easy Transcription

Setup

Usage

Options

FAQ

It takes too long to process the audio file

Speaker diarization isn't very accurate

Authors

Feedback

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Contributors 2

Languages

Packages