Add headers for multiple language identification #99

mbatchkarov · 2024-01-30T09:18:58Z

Issue #, if available: N/A

Description of changes:
Hi, Transcribe SDE here. We recently launched a new feature called multiple language identification. We've been asked to contribute to this package to enable the feature so customers can use it.

Notes
language_code is currently a required positional parameter. When language ID is added, language code should become optional. That means we'd have to make it the third param and give it a default value, but this would break existing clients that use positional-only arguments. Therefore I'm leaving it as a positional arg and requiring that it's set to None when language ID is enabled. I'm not too happy with this option either- happy to discuss. Maybe something like this would be more ergonomic:

        if identify_language or identify_language:
            warnings.warm("Setting language_code to None because language ID is enabled")
            language_code = None

Testing
I extended and ran the integration tests locally from my own AWS account.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mbatchkarov · 2024-01-30T09:23:02Z

Update: just saw #89 - looks like plain language ID is also not supported. Will update the PR to include that too

praveenXira · 2024-03-13T13:12:42Z

When will this PR get merged? I really need multiple language identification feature.

GameSetAndMatch · 2024-03-28T10:09:05Z

amazon_transcribe/client.py

+        identify_language: Optional[bool] = False,
+        preferred_language: Optional[str] = None,
+        identify_multiple_languages=False,
+        language_options=None,


I am not familiar with AWS coding standards, but is there a reason why there isn't type hiting on identify_multiple_languages and language_options ?

Suggested change

language_options=None,

language_options: Optional[List[str]] = None,

GameSetAndMatch · 2024-03-28T10:13:51Z

Definitely a feature I would use in the short term if it was reviewed and merged by AWS team, good work @mbatchkarov !

amazon_transcribe/serialize.py

Header names are separated by hyphens, not underscores https://docs.aws.amazon.com/transcribe/latest/dg/lang-id-stream.html Co-authored-by: Gunwoo Kim <[email protected]>

pencil

Thanks for putting this up! Unfortunately, it looks like AWS has abandoned this client 😒

pencil · 2025-01-15T17:34:58Z

amazon_transcribe/client.py

+        identify_language: Optional[bool] = False,
+        preferred_language: Optional[str] = None,
+        identify_multiple_languages=False,
+        language_options=None,


Suggested change

language_options=None,

language_options: Optional[List[str]] = None,

pencil · 2025-01-15T17:35:15Z

amazon_transcribe/client.py

@@ -84,6 +84,10 @@ async def start_stream_transcription(
        enable_partial_results_stabilization: Optional[bool] = None,
        partial_results_stability: Optional[str] = None,
        language_model_name: Optional[str] = None,
+        identify_language: Optional[bool] = False,
+        preferred_language: Optional[str] = None,
+        identify_multiple_languages=False,


Suggested change

identify_multiple_languages=False,

identify_multiple_languages: Optional[bool] = None,

pencil · 2025-01-15T17:35:27Z

amazon_transcribe/client.py

@@ -84,6 +84,10 @@ async def start_stream_transcription(
        enable_partial_results_stabilization: Optional[bool] = None,
        partial_results_stability: Optional[str] = None,
        language_model_name: Optional[str] = None,
+        identify_language: Optional[bool] = False,


Suggested change

identify_language: Optional[bool] = False,

identify_language: Optional[bool] = None,

kandakji · 2025-01-29T15:29:15Z

why isn't this already merged?

mirobat

TODO- add support for CVs

it reached end of life in mind 2023 and github runnners no longer seem to support it

the run is failing https://github.com/awslabs/amazon-transcribe-streaming-sdk/actions/runs/13289281664/job/37105650123 - need to remove the yml attribute, and update to the latest action version while we're at it

codecov-commenter · 2025-02-12T21:51:36Z

Codecov Report

Attention: Patch coverage is 88.09524% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.68%. Comparing base (7eae6ca) to head (f4fd86d).

Files with missing lines	Patch %	Lines
tests/integration/test_client.py	0.00%	5 Missing ⚠️

❌ Your patch status has failed because the patch coverage (88.09%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #99      +/-   ##
===========================================
- Coverage    88.86%   88.68%   -0.19%     
===========================================
  Files           34       33       -1     
  Lines         1967     2050      +83     
===========================================
+ Hits          1748     1818      +70     
- Misses         219      232      +13

Flag	Coverage Δ
unittests	`88.68% <88.09%> (-0.19%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

xuejiey · 2025-02-13T23:22:08Z

amazon_transcribe/client.py

+            You must also provide at least two language_options and set
+            language_code to None
+        :param language_options:
+            A list of possible language to use when identify_multiple_languages is


Should mention identify_language as well

xuejiey · 2025-02-13T23:22:24Z

amazon_transcribe/client.py

@@ -100,14 +105,18 @@ async def start_stream_transcription(
        than 5 minutes.

        :param language_code:
-            Indicates the source language used in the input audio stream.
+            Indicates the source language used in the input audio stream. Set to
+            None if identify_multiple_languages is set to True


Should mention identify_language as well

xuejiey · 2025-02-13T23:23:37Z

amazon_transcribe/model.py

+            You must also provide at least two language_options and set
+            language_code to None
+    : param language_options:
+        A list of possible language to use when identify_multiple_languages is


Should mention identify_language as well
NIT: Space after :

xuejiey · 2025-02-13T23:28:02Z

amazon_transcribe/client.py

@@ -75,6 +75,7 @@ async def start_stream_transcription(
        media_sample_rate_hz: int,
        media_encoding: str,
        vocabulary_name: Optional[str] = None,
+        vocabulary_names: Optional[List[str]] = None,
        session_id: Optional[str] = None,
        vocab_filter_method: Optional[str] = None,
        vocab_filter_name: Optional[str] = None,


Can you add VocabularyFilterNames

add headers for multiple language identification

ce50006

add headers for language identification

63ffee5

GameSetAndMatch reviewed Mar 28, 2024

View reviewed changes

gunwooterry reviewed Apr 27, 2024

View reviewed changes

amazon_transcribe/serialize.py Outdated Show resolved Hide resolved

Fix typo in header name

7736dc4

Header names are separated by hyphens, not underscores https://docs.aws.amazon.com/transcribe/latest/dg/lang-id-stream.html Co-authored-by: Gunwoo Kim <[email protected]>

pencil reviewed Jan 15, 2025

View reviewed changes

address CR comments and add language_code in response

107357e

mirobat previously approved these changes Feb 11, 2025

View reviewed changes

add support for CV when using Multi LID

ee532c6

mbatchkarov dismissed mirobat’s stale review via ee532c6 February 12, 2025 15:41

mbatchkarov added 3 commits February 12, 2025 16:43

remove python 3.7 from test matrix

cde728b

it reached end of life in mind 2023 and github runnners no longer seem to support it

remove all uses of py 3.7 and replace with 3.11

6d41e7e

update codecov action

43c434a

the run is failing https://github.com/awslabs/amazon-transcribe-streaming-sdk/actions/runs/13289281664/job/37105650123 - need to remove the yml attribute, and update to the latest action version while we're at it

mbatchkarov added 4 commits February 13, 2025 13:03

fix black warning

890b91d

update deprecated ubuntu used in linter

f6220b6

exclude tests from coverage

2506bd4

fix flake8 warning

7d5ab0b

xuejiey reviewed Feb 13, 2025

View reviewed changes

mbatchkarov added 2 commits February 14, 2025 15:55

add support for vocabulary filters

b40d3b8

fix bad merge :(

f4fd86d

xuejiey approved these changes Feb 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add headers for multiple language identification #99

Add headers for multiple language identification #99

mbatchkarov commented Jan 30, 2024 •

edited

Loading

mbatchkarov commented Jan 30, 2024

praveenXira commented Mar 13, 2024

GameSetAndMatch Mar 28, 2024

pencil Jan 15, 2025

GameSetAndMatch commented Mar 28, 2024

pencil left a comment

pencil Jan 15, 2025

pencil Jan 15, 2025

pencil Jan 15, 2025

kandakji commented Jan 29, 2025

mirobat left a comment

codecov-commenter commented Feb 12, 2025 •

edited

Loading

xuejiey Feb 13, 2025

xuejiey Feb 13, 2025

xuejiey Feb 13, 2025

xuejiey Feb 13, 2025

	language_options=None,
	language_options: Optional[List[str]] = None,

	identify_multiple_languages=False,
	identify_multiple_languages: Optional[bool] = None,

	identify_language: Optional[bool] = False,
	identify_language: Optional[bool] = None,

Add headers for multiple language identification #99

Are you sure you want to change the base?

Add headers for multiple language identification #99

Conversation

mbatchkarov commented Jan 30, 2024 • edited Loading

mbatchkarov commented Jan 30, 2024

praveenXira commented Mar 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GameSetAndMatch commented Mar 28, 2024

pencil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kandakji commented Jan 29, 2025

mirobat left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 12, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbatchkarov commented Jan 30, 2024 •

edited

Loading

codecov-commenter commented Feb 12, 2025 •

edited

Loading