-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add headers for multiple language identification #99
base: develop
Are you sure you want to change the base?
Conversation
Update: just saw #89 - looks like plain language ID is also not supported. Will update the PR to include that too |
When will this PR get merged? I really need multiple language identification feature. |
amazon_transcribe/client.py
Outdated
identify_language: Optional[bool] = False, | ||
preferred_language: Optional[str] = None, | ||
identify_multiple_languages=False, | ||
language_options=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with AWS coding standards, but is there a reason why there isn't type hiting on identify_multiple_languages
and language_options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language_options=None, | |
language_options: Optional[List[str]] = None, |
Definitely a feature I would use in the short term if it was reviewed and merged by AWS team, good work @mbatchkarov ! |
Header names are separated by hyphens, not underscores https://docs.aws.amazon.com/transcribe/latest/dg/lang-id-stream.html Co-authored-by: Gunwoo Kim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this up! Unfortunately, it looks like AWS has abandoned this client 😒
amazon_transcribe/client.py
Outdated
identify_language: Optional[bool] = False, | ||
preferred_language: Optional[str] = None, | ||
identify_multiple_languages=False, | ||
language_options=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language_options=None, | |
language_options: Optional[List[str]] = None, |
amazon_transcribe/client.py
Outdated
@@ -84,6 +84,10 @@ async def start_stream_transcription( | |||
enable_partial_results_stabilization: Optional[bool] = None, | |||
partial_results_stability: Optional[str] = None, | |||
language_model_name: Optional[str] = None, | |||
identify_language: Optional[bool] = False, | |||
preferred_language: Optional[str] = None, | |||
identify_multiple_languages=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identify_multiple_languages=False, | |
identify_multiple_languages: Optional[bool] = None, |
amazon_transcribe/client.py
Outdated
@@ -84,6 +84,10 @@ async def start_stream_transcription( | |||
enable_partial_results_stabilization: Optional[bool] = None, | |||
partial_results_stability: Optional[str] = None, | |||
language_model_name: Optional[str] = None, | |||
identify_language: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identify_language: Optional[bool] = False, | |
identify_language: Optional[bool] = None, |
why isn't this already merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO- add support for CVs
it reached end of life in mind 2023 and github runnners no longer seem to support it
the run is failing https://github.com/awslabs/amazon-transcribe-streaming-sdk/actions/runs/13289281664/job/37105650123 - need to remove the yml attribute, and update to the latest action version while we're at it
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (88.09%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #99 +/- ##
===========================================
- Coverage 88.86% 88.68% -0.19%
===========================================
Files 34 33 -1
Lines 1967 2050 +83
===========================================
+ Hits 1748 1818 +70
- Misses 219 232 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
amazon_transcribe/client.py
Outdated
You must also provide at least two language_options and set | ||
language_code to None | ||
:param language_options: | ||
A list of possible language to use when identify_multiple_languages is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention identify_language as well
amazon_transcribe/client.py
Outdated
@@ -100,14 +105,18 @@ async def start_stream_transcription( | |||
than 5 minutes. | |||
|
|||
:param language_code: | |||
Indicates the source language used in the input audio stream. | |||
Indicates the source language used in the input audio stream. Set to | |||
None if identify_multiple_languages is set to True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention identify_language as well
You must also provide at least two language_options and set | ||
language_code to None | ||
: param language_options: | ||
A list of possible language to use when identify_multiple_languages is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention identify_language as well
NIT: Space after :
@@ -75,6 +75,7 @@ async def start_stream_transcription( | |||
media_sample_rate_hz: int, | |||
media_encoding: str, | |||
vocabulary_name: Optional[str] = None, | |||
vocabulary_names: Optional[List[str]] = None, | |||
session_id: Optional[str] = None, | |||
vocab_filter_method: Optional[str] = None, | |||
vocab_filter_name: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add VocabularyFilterNames
Issue #, if available: N/A
Description of changes:
Hi, Transcribe SDE here. We recently launched a new feature called multiple language identification. We've been asked to contribute to this package to enable the feature so customers can use it.
Notes
language_code
is currently a required positional parameter. When language ID is added, language code should become optional. That means we'd have to make it the third param and give it a default value, but this would break existing clients that use positional-only arguments. Therefore I'm leaving it as a positional arg and requiring that it's set to None when language ID is enabled. I'm not too happy with this option either- happy to discuss. Maybe something like this would be more ergonomic:Testing
I extended and ran the integration tests locally from my own AWS account.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.