Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: STT lang detector #289

Merged
merged 4 commits into from
Nov 24, 2024
Merged

feat: STT lang detector #289

merged 4 commits into from
Nov 24, 2024

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Nov 24, 2024

allows STT plugins to detect the language from audio

most plugins don't support this, but some like whisper do and would otherwise require loading 2 models into memory

related: OpenVoiceOS/ovos-stt-http-server#19

Summary by CodeRabbit

  • New Features
    • Introduced automatic language detection for audio transcription.
    • Added a method to bind a language detector to the speech-to-text functionality.
  • Bug Fixes
    • Updated deprecation warnings for several STT classes to encourage subclassing from the new STT class.
  • Improvements
    • Enhanced the transcribe method to utilize language detection when the language parameter is set to "auto."
    • Updated the available_languages property to reflect improved type hinting.

allows STT plugins to detect the language from audio

most plugins don't support this, but some like whisper do and would otherwise require loading 2 models into memory
Copy link
Contributor

coderabbitai bot commented Nov 24, 2024

Walkthrough

The changes introduce significant modifications to the STT class in the ovos_plugin_manager/templates/stt.py file. A new private attribute for language detection is added, along with methods for binding a language detector and detecting language from audio data. The transcribe method is updated to automatically detect the language if specified. Additionally, several classes are updated to indicate they should subclass directly from STT, and new dependencies are imported to support these functionalities.

Changes

File Path Change Summary
ovos_plugin_manager/templates/stt.py - Added private attribute _detector to STT class.
- Introduced bind method for assigning a language detector.
- Added detect_language method for language detection from audio data.
- Updated transcribe method to detect language automatically if lang is "auto".
- Updated available_languages property to return Set[str] instead of set.
- Updated TokenSTT, GoogleJsonSTT, BasicSTT, and KeySTT classes to reflect deprecation warnings.
- Imported LOG and AudioLanguageDetector to support new functionalities.

Suggested reviewers

  • goldyfruit
  • j1nx

Poem

In the land of code where rabbits play,
A new detector hops in, brightening the day.
With languages detected, our tasks now flow,
Transcribing with ease, watch our skills grow!
So here’s to the changes, let’s give a cheer,
For every new feature brings us more near! 🐇✨

Warning

Rate limit exceeded

@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 39 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between e280939 and b526f89.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added feature and removed feature labels Nov 24, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
ovos_plugin_manager/templates/stt.py (2)

42-42: Consider moving the import statement to the module level for clarity

Importing modules inside functions is generally discouraged unless necessary to prevent cyclic dependencies or reduce startup time. Moving the import of AudioData to the top of the file improves code readability and follows best practices.

Apply this diff to move the import statement:

+from speech_recognition import AudioData

 def detect_language(self, audio) -> str:
-    from speech_recognition import AudioData
     if not isinstance(audio, AudioData):
         raise TypeError("audio must be an instance of AudioData")

37-39: Ensure type safety in the bind method

While type hints are helpful, they are not enforced at runtime. Consider adding an explicit type check in the bind method to ensure that the detector parameter is an instance of AudioLanguageDetector. This will prevent potential runtime errors if an incorrect type is passed.

Apply this diff to add type checking:

 def bind(self, detector: AudioLanguageDetector):
+    if not isinstance(detector, AudioLanguageDetector):
+        raise TypeError("detector must be an instance of AudioLanguageDetector")
     self._detector = detector
     LOG.debug(f"{self.__class__.__name__} - Assigned lang detector: {detector}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ea81695 and 4290e19.

📒 Files selected for processing (1)
  • ovos_plugin_manager/templates/stt.py (3 hunks)
🔇 Additional comments (1)
ovos_plugin_manager/templates/stt.py (1)

140-141: Handle possible exceptions during language detection

When lang is set to "auto", the detect_language method is called, which may raise a NotImplementedError if _detector is not set. Ensure that this exception is properly handled to prevent unexpected crashes. Additionally, consider handling cases where language detection might fail or return None.

Run the following script to verify that all uses of transcribe handle exceptions properly:

This script searches for all instances where transcribe is called and displays the following 5 lines for manual inspection. Ensure that appropriate try-except blocks are in place where necessary.

ovos_plugin_manager/templates/stt.py Outdated Show resolved Hide resolved
allows STT plugins to detect the language from audio

most plugins don't support this, but some like whisper do and would otherwise require loading 2 models into memory
@github-actions github-actions bot added feature and removed feature labels Nov 24, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
ovos_plugin_manager/templates/stt.py (2)

37-39: Add parameter validation in bind method

While the type hint is good, consider adding runtime validation to ensure the detector is properly initialized.

 def bind(self, detector: AudioLanguageDetector):
+    if not isinstance(detector, AudioLanguageDetector):
+        raise TypeError("detector must be an instance of AudioLanguageDetector")
     self._detector = detector
     LOG.debug(f"{self.__class__.__name__} - Assigned lang detector: {detector}")

Line range hint 22-24: Add documentation for language detection feature

Consider adding class-level documentation to explain the language detection capability and how plugin developers can implement it.

 class STT(metaclass=ABCMeta):
-    """ STT Base class, all  STT backends derives from this one. """
+    """ STT Base class, all STT backends derives from this one.
+    
+    Features:
+    - Basic STT functionality through the execute() method
+    - Optional language detection through bind(detector) and detect_language(audio)
+    - Automatic language detection when transcribe() is called with lang="auto"
+    
+    Plugin developers can implement language detection by:
+    1. Binding a language detector using the bind() method
+    2. Ensuring available_languages property returns supported languages
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4290e19 and 62eb083.

📒 Files selected for processing (1)
  • ovos_plugin_manager/templates/stt.py (3 hunks)
🔇 Additional comments (1)
ovos_plugin_manager/templates/stt.py (1)

15-15: LGTM: New imports and attribute for language detection

The new imports and private attribute _detector are well-structured and follow Python conventions.

Also applies to: 19-19, 35-35

ovos_plugin_manager/templates/stt.py Outdated Show resolved Hide resolved
ovos_plugin_manager/templates/stt.py Outdated Show resolved Hide resolved
allows STT plugins to detect the language from audio

most plugins don't support this, but some like whisper do and would otherwise require loading 2 models into memory
@github-actions github-actions bot added feature and removed feature labels Nov 24, 2024
JarbasAl added a commit to OpenVoiceOS/ovos-stt-plugin-fasterwhisper that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
ovos_plugin_manager/templates/stt.py (3)

35-40: Add docstring to the bind method

The implementation is good, but adding a docstring would improve maintainability.

 def bind(self, detector: AudioLanguageDetector):
+    """Bind a language detector to this STT instance.
+    
+    Args:
+        detector (AudioLanguageDetector): The language detector to use
+    """
     self._detector = detector
     LOG.debug(f"{self.__class__.__name__} - Assigned lang detector: {detector}")

41-44: Add return value documentation

The method would benefit from documenting the return tuple format.

 def detect_language(self, audio, valid_langs: Optional[Union[Set[str], List[str]]] = None) -> Tuple[str, float]:
+    """Detect the language of the provided audio.
+    
+    Args:
+        audio: Audio data to analyze
+        valid_langs: Optional set/list of valid languages to consider
+    
+    Returns:
+        Tuple[str, float]: A tuple containing the detected language code and confidence score
+    
+    Raises:
+        NotImplementedError: If no language detector is bound
+    """
     if self._detector is None:
         raise NotImplementedError(f"{self.__class__.__name__} does not support audio language detection")
     return self._detector.detect(audio, valid_langs=valid_langs or self.available_languages)

138-139: Consider caching language detection results

For streaming use cases, consider caching the detected language to avoid redundant detection on subsequent transcriptions of the same audio stream.

+    _last_detected_lang: Optional[Tuple[str, float]] = None
+
     def transcribe(self, audio, lang: Optional[str] = None) -> List[Tuple[str, float]]:
         if lang is not None and lang == "auto":
-            lang, prob = self.detect_language(audio, self.available_languages)
+            # Cache detection result for potential reuse
+            if not self._last_detected_lang:
+                self._last_detected_lang = self.detect_language(audio, self.available_languages)
+            lang, prob = self._last_detected_lang
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 62eb083 and e280939.

📒 Files selected for processing (1)
  • ovos_plugin_manager/templates/stt.py (3 hunks)
🔇 Additional comments (3)
ovos_plugin_manager/templates/stt.py (3)

11-11: LGTM: Imports are well organized and necessary

The new imports support type hints and language detection functionality while maintaining clean dependency management.

Also applies to: 15-15, 19-19


143-143: LGTM: Improved type safety

The more specific return type hint Set[str] improves type safety and documentation.


Line range hint 241-244: Verify StreamingSTT implementations

The updated signature needs to be properly handled by all implementations.

✅ Verification successful

StreamingSTT signature change appears safe

The updated transcribe method signature in StreamingSTT is an abstract base class (ABC) implementation. From the verification:

  • No concrete implementations were found in the codebase
  • The base class is marked with metaclass=ABCMeta, making it an interface
  • The signature change maintains compatibility by using Optional types
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all StreamingSTT implementations to verify signature compatibility

# Search for classes that inherit from StreamingSTT
ast-grep --pattern 'class $NAME($$$StreamingSTT$$$) {
  $$$
}'

# Search for transcribe method implementations
ast-grep --pattern 'def transcribe(self, audio$$$) {
  $$$
}'

Length of output: 128


Script:

#!/bin/bash
# Let's try a different approach using ripgrep to find StreamingSTT implementations

# Search for class definitions inheriting from StreamingSTT
rg "class.*\(.*StreamingSTT.*\)" -l

# Search for files containing StreamingSTT to analyze implementations
rg "StreamingSTT" -l

# Search for transcribe method definitions
rg "def transcribe.*\(" -B1 -A5

Length of output: 1404


Script:

#!/bin/bash
# Let's examine the StreamingSTT class definition and its base class

# Check StreamingSTT base class definition
rg "class StreamingSTT" -A 10 ovos_plugin_manager/templates/stt.py

# Check if there are any concrete implementations in test files
rg "class.*StreamingSTT" -A 10 test/unittests/test_stt.py

# Look for any other potential implementations
rg "class.*StreamingSTT" -A 10 ovos_plugin_manager/stt.py

Length of output: 480

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
ovos_plugin_manager/templates/stt.py (1)

Line range hint 251-254: Consider adding language detection support to StreamingSTT

The StreamingSTT class's transcribe method should be updated to support language detection similar to the base STT class. This would provide consistent behavior across all STT implementations.

 def transcribe(self, audio: Optional = None,
                lang: Optional[str] = None) -> List[Tuple[str, float]]:
     """transcribe audio data to a list of
     possible transcriptions and respective confidences"""
+    if lang is not None and lang == "auto":
+        try:
+            lang, prob = self.detect_language(audio, self.available_languages)
+        except Exception as e:
+            LOG.error(f"Language detection failed: {e}. Falling back to default language.")
+            lang = self.lang  # Fall back to default language
     return [(self.execute(audio, lang), 1.0)]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e280939 and b526f89.

📒 Files selected for processing (1)
  • ovos_plugin_manager/templates/stt.py (4 hunks)
🔇 Additional comments (3)
ovos_plugin_manager/templates/stt.py (3)

11-11: LGTM: Import changes are appropriate

The new imports support type hints and language detection functionality.

Also applies to: 16-19


139-144: LGTM: Robust language detection implementation

The implementation includes proper error handling with fallback to default language, and the error is appropriately logged.


148-148: Consider backwards compatibility impact

The return type change from set to Set[str] is more precise but could affect existing plugins that don't use type checking. Consider adding a note in the changelog about this breaking change.

Let's check for implementations that might be affected:

ovos_plugin_manager/templates/stt.py Show resolved Hide resolved
JarbasAl added a commit to OpenVoiceOS/ovos-stt-http-server that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
@JarbasAl JarbasAl merged commit 5b84b21 into dev Nov 24, 2024
8 checks passed
@JarbasAl JarbasAl deleted the feat/stt_lang_detect branch November 24, 2024 16:30
JarbasAl added a commit to OpenVoiceOS/ovos-stt-plugin-fasterwhisper that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
JarbasAl added a commit to OpenVoiceOS/ovos-stt-http-server that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
JarbasAl added a commit to OpenVoiceOS/ovos-stt-plugin-fasterwhisper that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
JarbasAl added a commit to OpenVoiceOS/ovos-stt-http-server that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
JarbasAl added a commit to OpenVoiceOS/ovos-stt-plugin-fasterwhisper that referenced this pull request Nov 24, 2024
companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
JarbasAl added a commit to OpenVoiceOS/ovos-stt-http-server that referenced this pull request Nov 24, 2024
* feat: lang detector

Update requirements.txt

feat/lang_detection_plugin

* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models

* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models

* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
JarbasAl added a commit to OpenVoiceOS/ovos-stt-plugin-fasterwhisper that referenced this pull request Nov 24, 2024
* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models

* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models

* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models

* feat: STT lang detector

companion to OpenVoiceOS/ovos-plugin-manager#289

allow lang detection without initing 2 models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant