Merge remote-tracking branch 'origin/next'

linto-ai · Apr 22, 2024 · e0b6fea · e0b6fea
2 parents 961e58f + 0b49d10
commit e0b6fea
Show file tree

Hide file tree

Showing 5 changed files with 21 additions and 19 deletions.
diff --git a/README.md b/README.md
@@ -4,6 +4,8 @@ LinTO-STT is an API for Automatic Speech Recognition (ASR).
 
 LinTO-STT can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
+It can be used to do offline or real-time transcriptions.
+
 The following families of STT models are currently supported (please refer to respective documentation for more details):
 * [Kaldi models](kaldi/README.md) 
 * [Whisper models](whisper/README.md)

diff --git a/kaldi/README.md b/kaldi/README.md
@@ -4,6 +4,8 @@ LinTO-STT-Kaldi is an API for Automatic Speech Recognition (ASR) based on models
 
 LinTO-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
 
+It can be used to do offline or real-time transcriptions.
+
 ## Pre-requisites
 
 ### Hardware
@@ -46,11 +48,9 @@ docker pull lintoai/linto-stt-kaldi
 
 Have the acoustic and language model ready at AM_PATH and LM_PATH if you are using LinTO models. If you are using a Vosk model, have it ready at MODEL.
 
-**3- Fill the .env**
+**3- Fill the .env file**
 
-```bash
-cp kaldi/.envdefault kaldi/.env
-```
+An example of .env file is provided in [kaldi/.envdefault](https://github.com/linto-ai/linto-stt/blob/master/kaldi/.envdefault).
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
@@ -85,7 +85,7 @@ docker run --rm \
 -p HOST_SERVING_PORT:80 \
 -v AM_PATH:/opt/AM \
 -v LM_PATH:/opt/LM \
---env-file kaldi/.env \
+--env-file .env \
 linto-stt-kaldi:latest
 ```
 
@@ -111,7 +111,7 @@ docker run --rm \
 -v AM_PATH:/opt/AM \
 -v LM_PATH:/opt/LM \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
---env-file kaldi/.env \
+--env-file .env \
 linto-stt-kaldi:latest
 ```
 

diff --git a/whisper/.envdefault b/whisper/.envdefault
@@ -55,7 +55,7 @@ PROMPT=
 # CUDA_VISIBLE_DEVICES=0
 
 # Number of threads per worker when running on CPU
-NUM_THREADS=4
+# NUM_THREADS=4
 
 # Number of workers minus one (all except from the main one)
 CONCURRENCY=2
diff --git a/whisper/README.md b/whisper/README.md
@@ -2,7 +2,9 @@
 
 LinTO-STT-Whisper is an API for Automatic Speech Recognition (ASR) based on [Whisper models](https://openai.com/research/whisper).
 
-LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
+LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector. 
+
+It can be used to do offline or real-time transcriptions.
 
 ## Pre-requisites
 
@@ -106,11 +108,9 @@ or
 docker pull lintoai/linto-stt-whisper
 ```
 
-### 2- Fill the .env
+### 2- Fill the .env file
 
-```bash
-cp whisper/.envdefault whisper/.env
-```
+An example of .env file is provided in [whisper/.envdefault](https://github.com/linto-ai/linto-stt/blob/master/whisper/.envdefault).
 
 | PARAMETER | DESCRIPTION | EXEMPLE |
 |---|---|---|
@@ -184,7 +184,7 @@ yo(yoruba), zh(chinese)
 ```
 and also `yue(cantonese)` since large-v3.
 
-### Serving mode 
+#### SERVING_MODE
 ![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)
 
 STT can be used in two ways:
@@ -195,6 +195,7 @@ Mode is specified using the .env value or environment variable ```SERVING_MODE``
 ```bash
 SERVICE_MODE=http
 ```
+
 ### HTTP Server
 The HTTP serving mode deploys a HTTP server and a swagger-ui to allow transcription request on a dedicated route.
 
@@ -203,7 +204,7 @@ The SERVICE_MODE value in the .env should be set to ```http```.
 ```bash
 docker run --rm \
 -p HOST_SERVING_PORT:80 \
---env-file whisper/.env \
+--env-file .env \
 linto-stt-whisper:latest
 ```
 
@@ -236,7 +237,7 @@ You need a message broker up and running at MY_SERVICE_BROKER.
 ```bash
 docker run --rm \
 -v SHARED_AUDIO_FOLDER:/opt/audio \
---env-file whisper/.env \
+--env-file .env \
 linto-stt-whisper:latest
 ```
 
@@ -371,4 +372,4 @@ This project is developped under the AGPLv3 License (see LICENSE).
 * [HuggingFace Transformers](https://github.com/huggingface/transformers)
 * [SpeechBrain](https://github.com/speechbrain/speechbrain)
 * [TorchAudio](https://github.com/pytorch/audio)
-* [Whisper_Streaming](https://github.com/ufal/whisper_streaming)
+* [Whisper_Streaming](https://github.com/ufal/whisper_streaming)
diff --git a/whisper/stt/__init__.py b/whisper/stt/__init__.py
@@ -23,9 +23,6 @@
 VAD_MIN_SPEECH_DURATION = float(os.environ.get("VAD_MIN_SPEECH_DURATION", 0.1))
 VAD_MIN_SILENCE_DURATION = float(os.environ.get("VAD_MAX_SILENCE_DURATION", 0.1))
 
-NUM_THREADS = os.environ.get("NUM_THREADS", os.environ.get("OMP_NUM_THREADS"))
-NUM_THREADS = int(NUM_THREADS)
-
 try:
     import faster_whisper
 
@@ -55,13 +52,15 @@
     def set_num_threads(n):
         # os.environ["OMP_NUM_THREADS"] = str(n)
         pass
+    DEFAULT_NUM_THREADS = None
 else:
     import torch
     DEFAULT_NUM_THREADS = torch.get_num_threads()
     def set_num_threads(n):
         torch.set_num_threads(n)
 
 # Number of CPU threads
+NUM_THREADS = os.environ.get("NUM_THREADS", os.environ.get("OMP_NUM_THREADS"))
 if NUM_THREADS is None:
     NUM_THREADS = DEFAULT_NUM_THREADS
 if NUM_THREADS is not None: