Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with --batched: Sentences Are Displayed All at Once Instead of One by One #1235

Open
izyspania opened this issue Jan 30, 2025 · 14 comments

Comments

@izyspania
Copy link

izyspania commented Jan 30, 2025

I’m encountering an issue when using the --batched option in faster-whisper. Without --batched, the transcription processes one sentence at a time, displaying them sequentially, which works well for my use case. However, when I enable --batched, many sentences are displayed in a single segment, making the output harder to follow and less readable.

Additionally, when I don’t use --batched, my RAM usage spikes to full capacity during the file loading phase , i have 32G of RAM, making my computer unresponsive for a few minutes. However, when I enable --batched, it works smoothly, though I still experience the issue with the output being harder to read. This issue doesn’t happen when using normal Whisper.

Thanks

@MahmoudAshraf97
Copy link
Collaborator

@Purfview

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

There's the right place for your issues: https://github.com/Purfview/whisper-standalone-win

I’m encountering an issue when using the --batched option in faster-whisper. Without --batched, the transcription processes one sentence at a time, displaying them sequentially, which works well for my use case. However, when I enable --batched, many sentences are displayed in a single segment, making the output harder to follow and less readable.

I noticed the same behaviour, but what's displayed doesn't matter. Look at the subtitles. Use --sentence if you want output to be split to sentences, you can try --standard too.

Additionally, when I don’t use --batched, my RAM usage spikes to full capacity during the file loading phase , i have 32G of RAM, making my computer unresponsive for a few minutes. However, when I enable --batched, it works smoothly, though I still experience the issue with the output being harder to read. This issue doesn’t happen when using normal Whisper.

What's the version, the command, and the file?

@izyspania
Copy link
Author

izyspania commented Feb 3, 2025

Hi,

faster-whisper file.mp3 --language Romanian --device cuda --model large-v3-turbo --output_format srt --print_progress , this is what im using now but tried a few options combined with --batched including --sentence and different models and the results are the same. (i have not tried --standard yet)

Example.
EDIT: the audio (a cut): https://jumpshare.com/s/iIVzYbygkLa2PsueAKEH
Full file has 570 MB, it fills all the RAM when using without --batched and it starts to write on the disc for 20 secs but when it starts processing the RAM goes back to normal, it doesnt seem to happen with original whisper.

whisper (original) output:
1
00:00:00,000 --> 00:00:02,880
William Deal, 27

2
00:00:02,880 --> 00:00:04,840
Pista 2

3
00:00:04,840 --> 00:00:07,040
28

4
00:00:07,040 --> 00:00:15,700
Jenny părăsi hotelul înainte de ora 8 dimineața și, în drum spre destinație, schimbă trei taxiuri diferite.

5
00:00:16,440 --> 00:00:19,140
Era o șmecherie pe care o învățase de la Avrum.

6
00:00:20,120 --> 00:00:28,780
Plătea din timp taxiul, apoi sărea pe neașteptate din mașină, se srecura printre clădiri, lua un alt taxi, apoi repeta figura încă o dată.

faster-whisper (without --batched): most of the times like original whisper but some times acts exactly like when im using --batched (in this example it acted like with --batched)

faster-whisper (with --batched):
00:00:00,000 --> 00:00:28,780
William Deal, 27. Pista 2. 28. Jenny părăsi hotelul înainte de ora 8 dimineața și, în drum spre destinație, schimbă trei taxiuri diferite. Era o șmecherie pe care o învățase de la Avrum. Plătea din timp taxiul, apoi sărea pe neașteptate din mașină, se srecura printre clădiri, lua un alt taxi, apoi repeta figura încă o dată.

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

tried a few options combined with --batched including --sentence and different models and the results are the same.

It's not possible that "the results are the same".

Full file has 570 MB

How long it's in time?

most of the times like original whisper but some times acts exactly like when im using --batched

That's normal, expected behaviour. Vanilla Whisper can output same too.

@izyspania
Copy link
Author

izyspania commented Feb 3, 2025

That's is normal, expected behaviour. Vanilla Whisper can output same too.

True, but 90% of the time Vanilla Whisper outputs as i showed you.

Its about 10 hours long.

Anything i can do to make it output with --batched like how Vanilla Whisper does ? --sentence doesnt seem to do anything as far as i tested , you can test the file i linked.

I guess i need more RAM to process files this long? With --batched works great and 3x faster but i dont like the output as i showed you.

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

Its about 10 hours long.

Then it's this issue -> #1234

Anything i can do to make it output with --batched like how Vanilla Whisper does ? --sentence doesnt seem to do anything as far as i tested , you can test the file i linked.

Already wrote to you what to do.
Share the srt you got with --batched --sentence.

@izyspania
Copy link
Author

izyspania commented Feb 3, 2025

I just shared the output there form test.mp3

--batched --sentence (and with --standard and without --batched for this file is the same)

[00:00.000 --> 00:28.780] William Deal, 27. Pista 2. 28. Jenny părăsi hotelul înainte de ora 8 dimineața și, în drum spre destinație, schimbă trei taxiuri diferite. Era o șmecherie pe care o învățase de la Avrum. Plătea din timp taxiul, apoi sărea pe neașteptate din mașină, se srecura printre clădiri, lua un alt taxi, apoi repeta figura încă o dată.

Transcription speed: 21.36 audio seconds/s

This is from vanilla:

[00:00.000 --> 00:02.880] William Deal, 27
[00:02.880 --> 00:04.840] Pista 2
[00:04.840 --> 00:07.040] 28
[00:07.040 --> 00:15.700] Jenny părăsi hotelul înainte de ora 8 dimineața și, în drum spre destinație, schimbă trei taxiuri diferite.
[00:16.440 --> 00:19.120] Era o șmecherie pe care o învățase de la Avrum.
[00:20.120 --> 00:28.780] Plătea din timp taxiul, apoi sărea pe neașteptate din mașină, se strecura printre clădiri, lua un alt taxi, apoi repeta figura încă o dată.

Edit: more from the original file

1
00:00:00,000 --> 00:00:28,780
William Deal, 27. Pista 2. 28. Jenny părăsi hotelul înainte de ora 8 dimineața și, în drum spre destinație, schimbă trei taxiuri diferite. Era o șmecherie pe care o învățase de la Avrum. Plătea din timp taxiul, apoi sărea pe neașteptate din mașină, se srecura printre clădiri, lua un alt taxi, apoi repeta figura încă o dată.

2
00:00:28,780 --> 00:00:40,540
Așa cum procedase și când împărțea broșuri și ziarul conștiința Berlinului, ea nu controla să vadă dacă o urmărea cineva, ci prespunea că cineva chiar făcea acest lucru.

3
00:00:41,160 --> 00:00:51,720
În cele din urmă o apucă pe bulevardul Nei, care o colește pe la sud perimetrul orașului, o luă spre Arcul de Triunf, apoi se îndreptă spre Montparnasse.

4
00:00:52,480 --> 00:00:56,560
Mai merse puțin până ajunse la o cafenea de pe strada Long Camps.

5
00:00:56,560 --> 00:01:04,600
Cumpără ziarul de dimineață și se așeză la o masă în fundul sării, de unde putea să observe ușa și ceru o cafea.

And from vanilla:

1
00:00:00,000 --> 00:00:02,880
William Deal, 27

2
00:00:02,880 --> 00:00:04,840
Pista 2

3
00:00:04,840 --> 00:00:07,040
28

4
00:00:07,040 --> 00:00:15,700
Jenny părăsi hotelul înainte de ora 8 dimineața și, în drum spre destinație, schimbă trei taxiuri diferite.

5
00:00:16,440 --> 00:00:19,140
Era o șmecherie pe care o învățase de la Avrum.

6
00:00:20,120 --> 00:00:28,780
Plătea din timp taxiul, apoi sărea pe neașteptate din mașină, se srecura printre clădiri, lua un alt taxi, apoi repeta figura încă o dată.

7
00:00:28,780 --> 00:00:40,560
Așa cum procedase și când împărțea broșuri și ziarul conștiința Berlinului, ea nu controla să vadă dacă o urmărea cineva, ci prespunea că cineva chiar făcea acest lucru.

8
00:00:40,920 --> 00:00:51,740
În cele din urmă o apucă pe bulevardul Nei, care o colește pe la sud perimetrul orașului, o luă spre Arcul de Triunf, apoi se îndreptă spre Montparnasse.


Some more diff:

Vanilla:
28
00:03:19,140 --> 00:03:26,000
Aceste trupe de asalt au fost huliganii care au distrus magazine, au bătut, au asasinat oameni nevinovați

29
00:03:26,000 --> 00:03:35,100
și au fost promotorii antisemitismului lui Hitler, unul dintre principiile partidului nazis și ale celui de-al treilea rai.

Faster:
30
00:03:19,160 --> 00:03:35,140
Aceste trupe de asalt au fost huliganii care au distrus magazine, au bătut, au asasinat oameni nevinovați și au fost promotorii antisemitismului lui Hitler, unul dintre principiile partidului nazis și ale celui de-al treilea rai.

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

Share the srt file, not the copy/pastes.

This is from vanilla:

I don't need it.

@izyspania
Copy link
Author

The one from the 10 hours file or from the cut (test.mp3?)

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

From any file, where you used --batched --sentence.

@izyspania
Copy link
Author

izyspania commented Feb 3, 2025

test2.mp3 --language Romanian --device cuda --model large-v3-turbo --output_format srt --batched --sentence

Hmm , i did a small cut and tried again , the output looks OK now, ill do it again for the full 10 hour file and ill tell you the result , maybe i only looked on the output when using --sentence instead of checking the file (will take like 20 min with batched i think)
Edit: The things i pasted where from files not from the output screen.

I had to rename the file to .txt

test2.txt

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

Hmm , i did a small cut and tried again , the output looks OK now, ill do it again for the full 10 hour file...

No need, I told you that it's impossible.
Probably you just mixed up some files.

@izyspania
Copy link
Author

izyspania commented Feb 3, 2025

Im still going to try it now to be sure , if its working with --batched --sentence it fixes both of my problems. Thank you, sorry for the post i thought its a bug for sure.

Edit: Seems that it works with --batched --sentence , shows as vanilla in the file (not in the output screen) , you saved me a lot of time, thanks again.

I will ask you another quick question if you dont mind , is large-v3-turbo better than medium? It seems faster not sure if more accurate.

@Purfview
Copy link
Contributor

Purfview commented Feb 3, 2025

I will ask you another quick question if you dont mind , is large-v3-turbo better than medium?

For the transcriptions turbo should be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants