Detected language isn't exposed when the multilingual option is set #1233

elwinar · 2025-01-29T13:17:24Z

I'm using faster-whisper to transcribe videos which may have multiple spoken languages in their audio stream.

model = faster_whisper.WhisperModel(args.models_path, args.device, compute_type=compute_type, local_files_only=True)
segments, info = model.transcribe(
    args.stream_path,
    beam_size=5,
    vad_filter=True,
    multilingual=True,
    word_timestamps=True,
)

The only issue is that the detected language for segments isn't exposed in the results, which is a shame given it's available and would be tedious to compute afterwards.

I've thrown a few lines in a fork as a proof of concept aivetech@19cebaa, would you be open to discussion on adding the feature?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detected language isn't exposed when the multilingual option is set #1233

Detected language isn't exposed when the multilingual option is set #1233

elwinar commented Jan 29, 2025

Detected language isn't exposed when the multilingual option is set #1233

Detected language isn't exposed when the multilingual option is set #1233

Comments

elwinar commented Jan 29, 2025