An end to end full stack application to transform speech to text and perform further downstream tasks like Text Similarity, Text Summarization, Named Entity Recognition
Things to set up: Install Docker
Install ElasticSearch
Link for Tutorial: https://dylancastillo.co/elasticsearch-python/#what%E2%80%99s-elasticsearch
Install FastAPI
Link for Tutorial: https://fastapi.tiangolo.com/tutorial/first-steps/
Install Whisper AI
Tutorial Link:
https://medium.com/the-research-nest/how-to-setup-openais-whisper-model-on-windows-10-11-df001d5a350b
Install ffmpeg
Download the zip file from https://github.com/BtbN/FFmpeg-Builds/releases
Extract and put the link to bin folder into System varibales path variable
Install whisper-timestamped
https://github.com/linto-ai/whisper-timestamped
Install PyAnnote
https://github.com/pyannote/pyannote-audio