Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add language selection through config with whisper + Improve tests #48

Merged
merged 21 commits into from
Dec 2, 2024

Conversation

AudranBert
Copy link
Member

@AudranBert AudranBert commented Nov 22, 2024

Add language selection for streaming with whisper, by default it will take the language found in the env settings. But you can pass a language in the config when starting streaming.

It also adds the possibility to pass a language in the config in case of offline decoding as requested in #53 . It will enable having a same model instance used for multiple languages instead of launching another Docker.

The PR is also improving tests to add tests about languages. Also removing some useless ones in order to reduce testing duration.

@damienlaine
Copy link
Member

Could you clarify the list of supported languages? For example, does it include "en," "fr," etc.? On the LinTO side, we consistently use BCP-47 codes for language representation.
Parsers (env, API directives...) shall at least support BCP-47 codes as inputs.

@Jeronymous
Copy link
Member

Could you clarify the list of supported languages? For example, does it include "en," "fr," etc.? On the LinTO side, we consistently use BCP-47 codes for language representation. Parsers (env, API directives...) shall at least support BCP-47 codes as inputs.

That did not changes in this PR.
several formats are supported : "fr" and "fr-FR". This holds for the whole LinTO speech toolkit.

Supported languages are listed here : https://github.com/linto-ai/linto-stt/blob/master/whisper/README.md#language

Also if the user gives a wrong one, it will give an explicit message with the list of possible ones (in the format "fr").

Why this question ? Do you think something is missing in the code or the documentation ?

@damienlaine
Copy link
Member

I haven’t reviewed the code and relied on the doc:

  • The docs mention "two or three-letter codes" for languages but not BCP-47 tags—should this be clarified?
  • The PR focuses on streaming (?), but what about Celery (task) and HTTP service modes? Are specification updates planned for these?
  • For Celery, should we open an issue in https://github.com/linto-ai/linto-transcription to handle the target language correctly?

@AudranBert AudranBert changed the title [WIP] Add language selection for streaming with whisper + Improve tests [WIP] Add language selection with whisper + Improve tests Nov 29, 2024
@AudranBert
Copy link
Member Author

AudranBert commented Nov 29, 2024

The PR focuses on streaming (?), but what about Celery (task) and HTTP service modes? Are specification updates planned for these?

The PR was created to fix the selection language in streaming, but I added the possibility to send the language through the config for streaming and offline (http and task). That's why I linked this PR to the issue #53

@AudranBert AudranBert changed the title [WIP] Add language selection with whisper + Improve tests [WIP] Add language selection through config with whisper + Improve tests Nov 29, 2024
@AudranBert
Copy link
Member Author

The docs mention "two or three-letter codes" for languages but not BCP-47 tags—should this be clarified?

It should work with tags like "fr-FR" because it will split on the "-" and keep the first part (here "fr") and use that as language.

@Jeronymous
Copy link
Member

Jeronymous commented Nov 29, 2024

  • The docs mention "two or three-letter codes" for languages but not BCP-47 tags—should this be clarified?

Yes we should mention that they are supported, but that the second part ("FR" in "fr-FR") is ignored (results of the model are invariant to this)

  • The PR focuses on streaming (?), but what about Celery (task) and HTTP service modes? Are specification updates planned for these?

Yes. The PR is not finished yet ("WIP" in the title)

Yes. There will be an issue with that feature request.
Worst case I will make it when I will commit related things (mentioning the issue in the commit message : we discussed to use this as much as possible).
(our plan is to split the work : Audran here on core stt / me on transcription service API evolution)

Signed-off-by: AudranBert <[email protected]>
@AudranBert AudranBert changed the title [WIP] Add language selection through config with whisper + Improve tests Add language selection through config with whisper + Improve tests Dec 2, 2024
@Jeronymous Jeronymous changed the base branch from master to next December 2, 2024 14:31
@AudranBert
Copy link
Member Author

Tests are running, I don't know how much time it will take to finish

Signed-off-by: AudranBert <[email protected]>
@Jeronymous Jeronymous merged commit ca1a839 into next Dec 2, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add language selection for offline transcription with whisper models
3 participants