Skip to content

Releases: SYSTRAN/faster-whisper

faster-whisper 1.1.0

21 Nov 17:19
97a4785
Compare
Choose a tag to compare

New Features

  • New batched inference that is 4x faster and accurate, Refer to README on usage instructions.
  • Support for the new large-v3-turbo model.
  • VAD filter is now 3x faster on CPU.
  • Feature Extraction is now 3x faster.
  • Added log_progress to WhisperModel.transcribe to print transcription progress.
  • Added multilingual option to transcription to allow transcribing multilingual audio. Note that Large models already have codeswitching capabilities, so this is mostly beneficial to medium model or smaller.
  • WhisperModel.detect_language now has the option to use VAD filter and improved language detection using language_detection_segments and language_detection_threshold.

Bug Fixes

  • Use correct features padding for encoder input when chunk_length <30s
  • Use correct seek value in output

Other Changes

  • replace NamedTuple with dataclass in Word, Segment, TranscriptionOptions, TranscriptionInfo, and VadOptions, this allows conversion to json without nesting. Note that _asdict() method is still available in Word and Segment classes for backward compatibility but will be removed in the next release, you can use dataclasses.asdict() instead.
  • Added new tests for development
  • Updated benchmarks in the Readme
  • use jiwer instead of evaluate in benchmarks
  • Filter out non_speech_tokens in suppressed tokens by @jordimas in #898

New Contributors

Full Changelog: v1.0.3...v1.1.0

faster-whisper 1.0.3

01 Jul 10:05
c22db51
Compare
Choose a tag to compare

Upgrade Silero-Vad model to latest V5 version (#884)

Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0

  • window_size_samples parameter is fixed at 512.
  • Change to use the state variable instead of the existing h and c variables.
  • Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk.
  • Change the dimensions of the state variable from 64 to 128.
  • Replace ONNX file with V5 version

Other changes

  • Improve language detection when using clip_timestamps (#867)
  • Docker file improvements (#848)
  • Fix #839 incorrect clip_timestamps being used in model (#842)

faster-whisper 1.0.2

06 May 02:08
2f6913e
Compare
Choose a tag to compare
  • Add support for distil-large-v3 (#755)
    The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.

  • Benchmarks (#773)
    Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.

  • Support initializing more whisper model args (#807)

  • Small bug fix:

    • code breaks if audio is empty (#768)
    • Foolproof: Disable VAD if clip_timestamps is in use (#769)
    • make faster_whisper.assets as a valid python package to distribute (#774)
    • Loosen tokenizers version constraint (#804)
    • CUDA version and updated installation instructions (#785)
  • New feature from original openai Whisper project:

    • Feature/add hotwords (#731)
    • Improve language detection (#732)

faster-whisper 1.0.1

01 Mar 10:46
a342b02
Compare
Choose a tag to compare
  • Bug fixes and performance improvements:
    • Update logic to get segment from features before encoding (#705)
    • Fix window end heuristic for hallucination_silence_threshold (#706)

faster-whisper 1.0.0

22 Feb 08:56
06d32bf
Compare
Choose a tag to compare
  • Support distil-whisper model (#557)
    Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
    For more detail: https://github.com/huggingface/distil-whisper

  • Upgrade ctranslate2 version to 4.0 to support CUDA 12 (#694)

  • Upgrade PyAV version to 11.* to support Python3.12.x (#679)

  • Small bug fixes

    • Illogical "Avoid computing higher temperatures on no_speech" (#652)
    • broken prompt_reset_on_temperature (#604)
    • Word timing tweaks (#616)
  • New improvements from original OpenAI Whisper project

    • Skip silence around hallucinations (#646)
    • Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697)

faster-whisper 0.10.1

22 Feb 12:08
Compare
Choose a tag to compare

Fix the broken tag v0.10.0

faster-whisper 0.10.0

22 Feb 11:55
Compare
Choose a tag to compare
  • Support "large-v3" model with
    • The ability to load feature_size/num_mels and other from preprocessor_config.json
    • A new language token for Cantonese (yue)
  • Update CTranslate2 requirement to include the latest version 3.22.0
  • Update tokenizers requirement to include the latest version 0.15
  • Change the hub to fetch models from Systran organization

faster-whisper 0.9.0

18 Sep 14:34
Compare
Choose a tag to compare
  • Add function faster_whisper.available_models() to list the available model sizes
  • Add model property supported_languages to list the languages accepted by the model
  • Improve error message for invalid task and language parameters
  • Update tokenizers requirement to include the latest version 0.14

faster-whisper 0.8.0

04 Sep 10:01
Compare
Choose a tag to compare

Expose new transcription options

Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:

  • repetition_penalty to penalize the score of previously generated tokens (set > 1 to penalize)
  • no_repeat_ngram_size to prevent repetitions of ngrams with this size

Some values that were previously hardcoded in the transcription method:

  • prompt_reset_on_temperature to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)

Other changes

  • Fix a possible memory leak when decoding audio with PyAV by forcing the garbage collector to run
  • Add property duration_after_vad in the returned TranscriptionInfo object
  • Add "large" alias for the "large-v2" model
  • Log a warning when the model is English-only but the language parameter is set to something else

faster-whisper 0.7.1

24 Jul 09:20
Compare
Choose a tag to compare
  • Fix a bug related to no_speech_threshold: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech
  • Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability