-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add language selection through config with whisper + Improve tests #48
Conversation
Merge next -> master
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Could you clarify the list of supported languages? For example, does it include "en," "fr," etc.? On the LinTO side, we consistently use BCP-47 codes for language representation. |
That did not changes in this PR. Supported languages are listed here : https://github.com/linto-ai/linto-stt/blob/master/whisper/README.md#language Also if the user gives a wrong one, it will give an explicit message with the list of possible ones (in the format "fr"). Why this question ? Do you think something is missing in the code or the documentation ? |
I haven’t reviewed the code and relied on the doc:
|
The PR was created to fix the selection language in streaming, but I added the possibility to send the language through the config for streaming and offline (http and task). That's why I linked this PR to the issue #53 |
Signed-off-by: AudranBert <[email protected]>
It should work with tags like "fr-FR" because it will split on the "-" and keep the first part (here "fr") and use that as language. |
Yes we should mention that they are supported, but that the second part ("FR" in "fr-FR") is ignored (results of the model are invariant to this)
Yes. The PR is not finished yet ("WIP" in the title)
Yes. There will be an issue with that feature request. |
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Signed-off-by: AudranBert <[email protected]>
Tests are running, I don't know how much time it will take to finish |
d20dc2f
to
647d119
Compare
647d119
to
b43430f
Compare
Signed-off-by: AudranBert <[email protected]>
Add language selection for streaming with whisper, by default it will take the language found in the env settings. But you can pass a language in the config when starting streaming.
It also adds the possibility to pass a language in the config in case of offline decoding as requested in #53 . It will enable having a same model instance used for multiple languages instead of launching another Docker.
The PR is also improving tests to add tests about languages. Also removing some useless ones in order to reduce testing duration.