Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detected language isn't exposed when the multilingual option is set #1233

Open
elwinar opened this issue Jan 29, 2025 · 0 comments
Open

Detected language isn't exposed when the multilingual option is set #1233

elwinar opened this issue Jan 29, 2025 · 0 comments

Comments

@elwinar
Copy link

elwinar commented Jan 29, 2025

I'm using faster-whisper to transcribe videos which may have multiple spoken languages in their audio stream.

model = faster_whisper.WhisperModel(args.models_path, args.device, compute_type=compute_type, local_files_only=True)
segments, info = model.transcribe(
    args.stream_path,
    beam_size=5,
    vad_filter=True,
    multilingual=True,
    word_timestamps=True,
)

The only issue is that the detected language for segments isn't exposed in the results, which is a shame given it's available and would be tedious to compute afterwards.

I've thrown a few lines in a fork as a proof of concept aivetech@19cebaa, would you be open to discussion on adding the feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant