[Feature Request] Real-Time Speech-to-Text with Whisper Model 🎙️ #14

kadirnar · 2023-11-23T15:55:25Z

Implement real-time functionality for the Whisper model, enabling it to transcribe speech into text as the user speaks🎤

trivikramak · 2024-01-18T10:05:22Z

Is there any progress on this?

kadirnar · 2024-01-18T14:01:26Z

Is there any progress on this?

I don't have enough time to develop this. That's why this feature is not currently being developed. I will add it later.

trivikramak · 2024-01-20T05:40:57Z

I'm trying to use tiny models for on-device (mobile) near-real-time speech-to-text.
Can you suggest some direction or pointers if we have to implement this?

kadirnar · 2024-01-20T13:35:14Z

You can use distiller-whisper models.
Models: https://huggingface.co/distil-whisper

trivikramak · 2024-01-20T14:12:56Z

Thanks for the reply, from what I have read, I understood that the idea should be

chunking the audio at a randomly chosen small fixed time .. (say 3s)
padding it with silence to make it a 30s chunk
sending it to the whisper model for inference

Is there any better approach. It seems very inefficient to run inference on 30s chunks for a real-time streaming transcription. Am I missing something?

kadirnar · 2024-01-20T18:11:22Z

I don't understand If you want to use the whisper model in real-time, you can look at this library.

https://github.com/davabase/whisper_real_time

Nishant-Kumar-2002 · 2024-01-21T17:22:28Z

Hi,
I have worked with whisper models in real time transcription but the catch is in hls stream the video or audio is generated at a buffer of 6 seconds. So, we can use ffmpeg and threading to chunk out that clip and then transcribe.

kadirnar · 2024-01-21T17:26:59Z

Hi, I have worked with whisper models in real time transcription but the catch is in hls stream the video or audio is generated at a buffer of 6 seconds. So, we can use ffmpeg and threading to chunk out that clip and then transcribe.

Can you add Real-Time feature?

Nishant-Kumar-2002 · 2024-01-21T17:55:35Z

If we are doing on a streaming service then it take buffer time of 6 sec.
In real time we can clip second by second.

kadirnar · 2024-01-23T20:25:09Z

If we are doing on a streaming service then it take buffer time of 6 sec. In real time we can clip second by second.

I will research this issue.

kadirnar · 2024-01-24T11:53:51Z

Hi @Nishant-Kumar-2002 , can you review this code? This feature adds subtitles to the video.

Nishant-Kumar-2002 · 2024-01-24T15:06:17Z

Ok will check that.

Nishant-Kumar-2002 · 2024-01-24T16:51:24Z

Code looks good to me.

MilanaShhanukova · 2024-02-03T07:54:10Z

May I ask if the main idea is to implement real-time whisper to transcribe speech through the microphone or transcribe audio files in real-time to a file, so that we do not have to wait until the end of the audio?

kadirnar · 2024-02-03T12:44:10Z

May I ask if the main idea is to implement real-time whisper to transcribe speech through the microphone or transcribe audio files in real-time to a file, so that we do not have to wait until the end of the audio?

I want to do the first thing you said.

fraschm1998 · 2024-04-12T02:14:06Z

Any update on this? Would love real-time transcription of speech through a mic

Nishant-Kumar-2002 · 2024-04-12T02:29:00Z

Any update on this? Would love real-time transcription of speech through a mic @kadirnar

I would like to add this new feature.

kadirnar · 2024-04-15T08:35:54Z

@Nishant-Kumar-2002 Wonderful news 👍🏻 I'm waiting for the pull request.

kadirnar · 2024-05-02T16:48:05Z

I started coding. I will add this support over the weekend.

fraschm1998 · 2024-05-08T02:49:59Z

I started coding. I will add this support over the weekend.

Awesome looking forward to this! Thanks for your amazing work!

SeeknnDestroy · 2024-05-14T17:43:00Z

thanks for the awesome work @kadirnar! any eta on this?

kadirnar · 2024-05-14T18:45:49Z

thanks for the awesome work @kadirnar! any eta on this?

There are a few problems with real-time. It may take a while to figure it out. I'm developing for Autopipeline.

kadirnar added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed labels Nov 23, 2023

kadirnar self-assigned this Nov 23, 2023

kadirnar mentioned this issue Jan 24, 2024

🌠 Add AutoCaption Feature #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Real-Time Speech-to-Text with Whisper Model 🎙️ #14

[Feature Request] Real-Time Speech-to-Text with Whisper Model 🎙️ #14

kadirnar commented Nov 23, 2023

trivikramak commented Jan 18, 2024

kadirnar commented Jan 18, 2024

trivikramak commented Jan 20, 2024 •

edited

Loading

kadirnar commented Jan 20, 2024

trivikramak commented Jan 20, 2024 •

edited

Loading

kadirnar commented Jan 20, 2024

Nishant-Kumar-2002 commented Jan 21, 2024

kadirnar commented Jan 21, 2024

Nishant-Kumar-2002 commented Jan 21, 2024 •

edited

Loading

kadirnar commented Jan 23, 2024

kadirnar commented Jan 24, 2024 •

edited

Loading

Nishant-Kumar-2002 commented Jan 24, 2024

Nishant-Kumar-2002 commented Jan 24, 2024

MilanaShhanukova commented Feb 3, 2024

kadirnar commented Feb 3, 2024

fraschm1998 commented Apr 12, 2024 •

edited

Loading

Nishant-Kumar-2002 commented Apr 12, 2024 •

edited

Loading

kadirnar commented Apr 15, 2024

kadirnar commented May 2, 2024

fraschm1998 commented May 8, 2024

SeeknnDestroy commented May 14, 2024

kadirnar commented May 14, 2024

[Feature Request] Real-Time Speech-to-Text with Whisper Model 🎙️ #14

[Feature Request] Real-Time Speech-to-Text with Whisper Model 🎙️ #14

Comments

kadirnar commented Nov 23, 2023

trivikramak commented Jan 18, 2024

kadirnar commented Jan 18, 2024

trivikramak commented Jan 20, 2024 • edited Loading

kadirnar commented Jan 20, 2024

trivikramak commented Jan 20, 2024 • edited Loading

kadirnar commented Jan 20, 2024

Nishant-Kumar-2002 commented Jan 21, 2024

kadirnar commented Jan 21, 2024

Nishant-Kumar-2002 commented Jan 21, 2024 • edited Loading

kadirnar commented Jan 23, 2024

kadirnar commented Jan 24, 2024 • edited Loading

Nishant-Kumar-2002 commented Jan 24, 2024

Nishant-Kumar-2002 commented Jan 24, 2024

MilanaShhanukova commented Feb 3, 2024

kadirnar commented Feb 3, 2024

fraschm1998 commented Apr 12, 2024 • edited Loading

Nishant-Kumar-2002 commented Apr 12, 2024 • edited Loading

kadirnar commented Apr 15, 2024

kadirnar commented May 2, 2024

fraschm1998 commented May 8, 2024

SeeknnDestroy commented May 14, 2024

kadirnar commented May 14, 2024

trivikramak commented Jan 20, 2024 •

edited

Loading

trivikramak commented Jan 20, 2024 •

edited

Loading

Nishant-Kumar-2002 commented Jan 21, 2024 •

edited

Loading

kadirnar commented Jan 24, 2024 •

edited

Loading

fraschm1998 commented Apr 12, 2024 •

edited

Loading

Nishant-Kumar-2002 commented Apr 12, 2024 •

edited

Loading