Releases: SYSTRAN/faster-whisper
faster-whisper 1.1.0
New Features
- New batched inference that is 4x faster and accurate, Refer to README on usage instructions.
- Support for the new
large-v3-turbo
model. - VAD filter is now 3x faster on CPU.
- Feature Extraction is now 3x faster.
- Added
log_progress
toWhisperModel.transcribe
to print transcription progress. - Added
multilingual
option to transcription to allow transcribing multilingual audio. Note that Large models already have codeswitching capabilities, so this is mostly beneficial tomedium
model or smaller. WhisperModel.detect_language
now has the option to use VAD filter and improved language detection usinglanguage_detection_segments
andlanguage_detection_threshold
.
Bug Fixes
- Use correct features padding for encoder input when
chunk_length
<30s - Use correct
seek
value in output
Other Changes
- replace
NamedTuple
withdataclass
inWord
,Segment
,TranscriptionOptions
,TranscriptionInfo
, andVadOptions
, this allows conversion tojson
without nesting. Note that_asdict()
method is still available inWord
andSegment
classes for backward compatibility but will be removed in the next release, you can usedataclasses.asdict()
instead. - Added new tests for development
- Updated benchmarks in the Readme
- use
jiwer
instead ofevaluate
in benchmarks - Filter out non_speech_tokens in suppressed tokens by @jordimas in #898
New Contributors
- @Jiltseb made their first contribution in #856
- @heimoshuiyu made their first contribution in #1092
Full Changelog: v1.0.3...v1.1.0
faster-whisper 1.0.3
Upgrade Silero-Vad model to latest V5 version (#884)
Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0
- window_size_samples parameter is fixed at 512.
- Change to use the state variable instead of the existing h and c variables.
- Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk.
- Change the dimensions of the state variable from 64 to 128.
- Replace ONNX file with V5 version
Other changes
faster-whisper 1.0.2
-
Add support for distil-large-v3 (#755)
The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm. -
Benchmarks (#773)
Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper. -
Support initializing more whisper model args (#807)
-
Small bug fix:
-
New feature from original openai Whisper project:
faster-whisper 1.0.1
faster-whisper 1.0.0
-
Support distil-whisper model (#557)
Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper -
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (#694)
-
Upgrade PyAV version to 11.* to support Python3.12.x (#679)
-
Small bug fixes
-
New improvements from original OpenAI Whisper project
faster-whisper 0.10.1
Fix the broken tag v0.10.0
faster-whisper 0.10.0
- Support "large-v3" model with
- The ability to load
feature_size/num_mels
and other frompreprocessor_config.json
- A new language token for Cantonese (
yue
)
- The ability to load
- Update
CTranslate2
requirement to include the latest version 3.22.0 - Update
tokenizers
requirement to include the latest version 0.15 - Change the hub to fetch models from Systran organization
faster-whisper 0.9.0
- Add function
faster_whisper.available_models()
to list the available model sizes - Add model property
supported_languages
to list the languages accepted by the model - Improve error message for invalid
task
andlanguage
parameters - Update
tokenizers
requirement to include the latest version 0.14
faster-whisper 0.8.0
Expose new transcription options
Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:
repetition_penalty
to penalize the score of previously generated tokens (set > 1 to penalize)no_repeat_ngram_size
to prevent repetitions of ngrams with this size
Some values that were previously hardcoded in the transcription method:
prompt_reset_on_temperature
to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)
Other changes
- Fix a possible memory leak when decoding audio with PyAV by forcing the garbage collector to run
- Add property
duration_after_vad
in the returnedTranscriptionInfo
object - Add "large" alias for the "large-v2" model
- Log a warning when the model is English-only but the
language
parameter is set to something else
faster-whisper 0.7.1
- Fix a bug related to
no_speech_threshold
: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech - Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability