Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:finetuned models #12

Merged
merged 2 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions Dockerfile

This file was deleted.

12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp

`pip install ovos-stt-plugin-fasterwhisper`

## Configuration
## Models

available models are `'tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large', 'distil-large-v2', 'distil-medium.en', 'distil-small.en', 'distil-large-v3'`

available models are `"tiny.en", "tiny", "base.en", "base", "small.en", "small", "medium.en", "medium", "large-v2", "large-v3"`
you can also pass a full path to a local model or a huggingface repo_id, eg. `"projecte-aina/faster-whisper-large-v3-ca-3catparla"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the abbreviation "e.g." to include two periods.

The static analysis tool correctly points out that the abbreviation "e.g." should have two periods.

Please update the line as follows:

-you can also pass a full path to a local model or a huggingface repo_id, eg. `"projecte-aina/faster-whisper-large-v3-ca-3catparla"`
+you can also pass a full path to a local model or a huggingface repo_id, e.g., `"projecte-aina/faster-whisper-large-v3-ca-3catparla"`
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
you can also pass a full path to a local model or a huggingface repo_id, eg. `"projecte-aina/faster-whisper-large-v3-ca-3catparla"`
you can also pass a full path to a local model or a huggingface repo_id, e.g., `"projecte-aina/faster-whisper-large-v3-ca-3catparla"`
Tools
LanguageTool

[uncategorized] ~16-~16: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...a local model or a huggingface repo_id, eg. `"projecte-aina/faster-whisper-large-v3...

(E_G)


eg, to use Large model with GPU
You can [convert](https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#model-conversion) any whisper model, or use any [compatible model from huggingface](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=modified&search=faster-whisper)

## Configuration

To use Whisper as STT
to use Large model with GPU

```json
"stt": {
Expand Down
34 changes: 17 additions & 17 deletions ovos_stt_plugin_fasterwhisper/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@
from faster_whisper import WhisperModel, decode_audio, available_models
from ovos_plugin_manager.templates.stt import STT
from ovos_plugin_manager.templates.transformers import AudioLanguageDetector
from speech_recognition import AudioData
from ovos_utils.log import LOG
from speech_recognition import AudioData


class FasterWhisperLangClassifier(AudioLanguageDetector):
def __init__(self, config=None):
config = config or {}
super().__init__("ovos-audio-transformer-plugin-fasterwhisper", 10, config)
model = self.config.get("model")
model = self.config.get("model") or "small"
valid_model = model in FasterWhisperSTT.MODELS
if not model or not valid_model:
LOG.warning(f"{model} is not a valid model ({FasterWhisperSTT.MODELS}), using 'small' instead")
model = "small"
self.config["model"] = "small"
if not valid_model:
LOG.info(f"{model} is not default model_id ({FasterWhisperSTT.MODELS}), "
f"assuming huggingface repo_id or path to local model")

self.compute_type = self.config.get("compute_type", "int8")
self.use_cuda = self.config.get("use_cuda", False)
Expand All @@ -34,7 +34,7 @@ def audiochunk2array(audio_data: bytes):
audio_as_np_float32 = audio_as_np_int16.astype(np.float32)

# Normalise float32 array so that values are between -1.0 and +1.0
max_int16 = 2**15
max_int16 = 2 ** 15
data = audio_as_np_float32 / max_int16
return data

Expand Down Expand Up @@ -173,12 +173,11 @@ class FasterWhisperSTT(STT):

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
model = self.config.get("model")
model = self.config.get("model") or "small"
valid_model = model in FasterWhisperSTT.MODELS
if not model or not valid_model:
LOG.warning(f"{model} is not a valid model ({FasterWhisperSTT.MODELS}), using 'small' instead")
model = "small"
self.config["model"] = "small"
if not valid_model:
LOG.info(f"{model} is not default model_id ({FasterWhisperSTT.MODELS}), "
f"assuming huggingface repo_id or path to local model")

self.beam_size = self.config.get("beam_size", 5)
self.compute_type = self.config.get("compute_type", "int8")
Expand Down Expand Up @@ -252,19 +251,20 @@ def available_languages(self) -> set:
}

if __name__ == "__main__":
b = FasterWhisperSTT()
b = FasterWhisperSTT(config={"model": "projecte-aina/faster-whisper-large-v3-ca-3catparla"})

from speech_recognition import Recognizer, AudioFile

jfk = "/home/miro/PycharmProjects/ovos-stt-plugin-fasterwhisper/jfk.wav"
jfk = "/home/miro/PycharmProjects/ovos-stt-plugin-vosk/example.wav"
with AudioFile(jfk) as source:
audio = Recognizer().record(source)

a = b.execute(audio, language="en")
# 2023-04-29 17:42:30.769 - OVOS - __main__:execute:145 - INFO - Detected speech language 'en' with probability 1
a = b.execute(audio, language="ca")
print(a)
# And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country.

l = FasterWhisperLangClassifier()
lang, prob = l.detect(audio.get_wav_data())
lang, prob = l.detect(audio.get_wav_data(),
valid_langs=["pt", "es", "ca", "gl"])
print(lang, prob)
# es 0.7143379217828251
Loading