Releases · SYSTRAN/faster-whisper

21 Nov 17:19

MahmoudAshraf97

v1.1.0

97a4785

faster-whisper 1.1.0 Latest

Latest

New Features

New batched inference that is 4x faster and accurate, Refer to README on usage instructions.
Support for the new large-v3-turbo model.
VAD filter is now 3x faster on CPU.
Feature Extraction is now 3x faster.
Added log_progress to WhisperModel.transcribe to print transcription progress.
Added multilingual option to transcription to allow transcribing multilingual audio. Note that Large models already have codeswitching capabilities, so this is mostly beneficial to medium model or smaller.
WhisperModel.detect_language now has the option to use VAD filter and improved language detection using language_detection_segments and language_detection_threshold.

Bug Fixes

Use correct features padding for encoder input when chunk_length <30s
Use correct seek value in output

Other Changes

replace NamedTuple with dataclass in Word, Segment, TranscriptionOptions, TranscriptionInfo, and VadOptions, this allows conversion to json without nesting. Note that _asdict() method is still available in Word and Segment classes for backward compatibility but will be removed in the next release, you can use dataclasses.asdict() instead.
Added new tests for development
Updated benchmarks in the Readme
use jiwer instead of evaluate in benchmarks
Filter out non_speech_tokens in suppressed tokens by @jordimas in #898

New Contributors

@Jiltseb made their first contribution in #856
@heimoshuiyu made their first contribution in #1092

Full Changelog: v1.0.3...v1.1.0

Contributors

jordimas, Jiltseb, and heimoshuiyu

Assets 2

01 Jul 10:05

trungkienbkhn

v1.0.3

c22db51

faster-whisper 1.0.3

Upgrade Silero-Vad model to latest V5 version (#884)

Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0

window_size_samples parameter is fixed at 512.
Change to use the state variable instead of the existing h and c variables.
Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk.
Change the dimensions of the state variable from 64 to 128.
Replace ONNX file with V5 version

Other changes

Improve language detection when using clip_timestamps (#867)
Docker file improvements (#848)
Fix #839 incorrect clip_timestamps being used in model (#842)

Assets 2

06 May 02:08

trungkienbkhn

v1.0.2

2f6913e

faster-whisper 1.0.2

Add support for distil-large-v3 (#755)
The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
Benchmarks (#773)
Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.
Support initializing more whisper model args (#807)
Small bug fix:
- code breaks if audio is empty (#768)
- Foolproof: Disable VAD if clip_timestamps is in use (#769)
- make faster_whisper.assets as a valid python package to distribute (#774)
- Loosen tokenizers version constraint (#804)
- CUDA version and updated installation instructions (#785)
New feature from original openai Whisper project:
- Feature/add hotwords (#731)
- Improve language detection (#732)

Assets 2

01 Mar 10:46

nguyendc-systran

v1.0.1

a342b02

faster-whisper 1.0.1

Bug fixes and performance improvements:
- Update logic to get segment from features before encoding (#705)
- Fix window end heuristic for hallucination_silence_threshold (#706)

Assets 2

22 Feb 08:56

nguyendc-systran

v1.0.0

06d32bf

faster-whisper 1.0.0

Support distil-whisper model (#557)
Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (#694)
Upgrade PyAV version to 11.* to support Python3.12.x (#679)
Small bug fixes
- Illogical "Avoid computing higher temperatures on no_speech" (#652)
- broken prompt_reset_on_temperature (#604)
- Word timing tweaks (#616)
New improvements from original OpenAI Whisper project
- Skip silence around hallucinations (#646)
- Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697)

Assets 2

22 Feb 12:08

nguyendc-systran

v0.10.1

2b1d8cc

faster-whisper 0.10.1

Fix the broken tag v0.10.0

Assets 2

22 Feb 11:55

nguyendc-systran

v0.10.0

e1a218f

faster-whisper 0.10.0

Support "large-v3" model with
- The ability to load feature_size/num_mels and other from preprocessor_config.json
- A new language token for Cantonese (yue)
Update CTranslate2 requirement to include the latest version 3.22.0
Update tokenizers requirement to include the latest version 0.15
Change the hub to fetch models from Systran organization

Assets 2

18 Sep 14:34

guillaumekln

v0.9.0

5a0541e

faster-whisper 0.9.0

Add function faster_whisper.available_models() to list the available model sizes
Add model property supported_languages to list the languages accepted by the model
Improve error message for invalid task and language parameters
Update tokenizers requirement to include the latest version 0.14

Assets 2

04 Sep 10:01

guillaumekln

v0.8.0

ad388cd

faster-whisper 0.8.0

Expose new transcription options

Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:

repetition_penalty to penalize the score of previously generated tokens (set > 1 to penalize)
no_repeat_ngram_size to prevent repetitions of ngrams with this size

Some values that were previously hardcoded in the transcription method:

prompt_reset_on_temperature to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)

Other changes

Fix a possible memory leak when decoding audio with PyAV by forcing the garbage collector to run
Add property duration_after_vad in the returned TranscriptionInfo object
Add "large" alias for the "large-v2" model
Log a warning when the model is English-only but the language parameter is set to something else

Assets 2

24 Jul 09:20

guillaumekln

v0.7.1

5c17de1

faster-whisper 0.7.1

Fix a bug related to no_speech_threshold: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech
Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Bug Fixes

Other Changes

New Contributors

Contributors

Upgrade Silero-Vad model to latest V5 version (#884)

Other changes

Expose new transcription options

Other changes

Releases: SYSTRAN/faster-whisper

faster-whisper 1.1.0

New Features

Bug Fixes

Other Changes

New Contributors

Contributors

faster-whisper 1.0.3

Upgrade Silero-Vad model to latest V5 version (#884)

Other changes

faster-whisper 1.0.2

faster-whisper 1.0.1

faster-whisper 1.0.0

faster-whisper 0.10.1

faster-whisper 0.10.0

faster-whisper 0.9.0

faster-whisper 0.8.0

Expose new transcription options

Other changes

faster-whisper 0.7.1