Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/next'
Browse files Browse the repository at this point in the history
  • Loading branch information
Jeronymous committed Apr 22, 2024
2 parents 961e58f + 0b49d10 commit e0b6fea
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 19 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ LinTO-STT is an API for Automatic Speech Recognition (ASR).

LinTO-STT can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.

It can be used to do offline or real-time transcriptions.

The following families of STT models are currently supported (please refer to respective documentation for more details):
* [Kaldi models](kaldi/README.md)
* [Whisper models](whisper/README.md)
Expand Down
12 changes: 6 additions & 6 deletions kaldi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ LinTO-STT-Kaldi is an API for Automatic Speech Recognition (ASR) based on models

LinTO-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.

It can be used to do offline or real-time transcriptions.

## Pre-requisites

### Hardware
Expand Down Expand Up @@ -46,11 +48,9 @@ docker pull lintoai/linto-stt-kaldi

Have the acoustic and language model ready at AM_PATH and LM_PATH if you are using LinTO models. If you are using a Vosk model, have it ready at MODEL.

**3- Fill the .env**
**3- Fill the .env file**

```bash
cp kaldi/.envdefault kaldi/.env
```
An example of .env file is provided in [kaldi/.envdefault](https://github.com/linto-ai/linto-stt/blob/master/kaldi/.envdefault).

| PARAMETER | DESCRIPTION | EXEMPLE |
|---|---|---|
Expand Down Expand Up @@ -85,7 +85,7 @@ docker run --rm \
-p HOST_SERVING_PORT:80 \
-v AM_PATH:/opt/AM \
-v LM_PATH:/opt/LM \
--env-file kaldi/.env \
--env-file .env \
linto-stt-kaldi:latest
```

Expand All @@ -111,7 +111,7 @@ docker run --rm \
-v AM_PATH:/opt/AM \
-v LM_PATH:/opt/LM \
-v SHARED_AUDIO_FOLDER:/opt/audio \
--env-file kaldi/.env \
--env-file .env \
linto-stt-kaldi:latest
```

Expand Down
2 changes: 1 addition & 1 deletion whisper/.envdefault
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ PROMPT=
# CUDA_VISIBLE_DEVICES=0

# Number of threads per worker when running on CPU
NUM_THREADS=4
# NUM_THREADS=4

# Number of workers minus one (all except from the main one)
CONCURRENCY=2
19 changes: 10 additions & 9 deletions whisper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

LinTO-STT-Whisper is an API for Automatic Speech Recognition (ASR) based on [Whisper models](https://openai.com/research/whisper).

LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.

It can be used to do offline or real-time transcriptions.

## Pre-requisites

Expand Down Expand Up @@ -106,11 +108,9 @@ or
docker pull lintoai/linto-stt-whisper
```

### 2- Fill the .env
### 2- Fill the .env file

```bash
cp whisper/.envdefault whisper/.env
```
An example of .env file is provided in [whisper/.envdefault](https://github.com/linto-ai/linto-stt/blob/master/whisper/.envdefault).

| PARAMETER | DESCRIPTION | EXEMPLE |
|---|---|---|
Expand Down Expand Up @@ -184,7 +184,7 @@ yo(yoruba), zh(chinese)
```
and also `yue(cantonese)` since large-v3.

### Serving mode
#### SERVING_MODE
![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png)

STT can be used in two ways:
Expand All @@ -195,6 +195,7 @@ Mode is specified using the .env value or environment variable ```SERVING_MODE``
```bash
SERVICE_MODE=http
```

### HTTP Server
The HTTP serving mode deploys a HTTP server and a swagger-ui to allow transcription request on a dedicated route.

Expand All @@ -203,7 +204,7 @@ The SERVICE_MODE value in the .env should be set to ```http```.
```bash
docker run --rm \
-p HOST_SERVING_PORT:80 \
--env-file whisper/.env \
--env-file .env \
linto-stt-whisper:latest
```

Expand Down Expand Up @@ -236,7 +237,7 @@ You need a message broker up and running at MY_SERVICE_BROKER.
```bash
docker run --rm \
-v SHARED_AUDIO_FOLDER:/opt/audio \
--env-file whisper/.env \
--env-file .env \
linto-stt-whisper:latest
```

Expand Down Expand Up @@ -371,4 +372,4 @@ This project is developped under the AGPLv3 License (see LICENSE).
* [HuggingFace Transformers](https://github.com/huggingface/transformers)
* [SpeechBrain](https://github.com/speechbrain/speechbrain)
* [TorchAudio](https://github.com/pytorch/audio)
* [Whisper_Streaming](https://github.com/ufal/whisper_streaming)
* [Whisper_Streaming](https://github.com/ufal/whisper_streaming)
5 changes: 2 additions & 3 deletions whisper/stt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,6 @@
VAD_MIN_SPEECH_DURATION = float(os.environ.get("VAD_MIN_SPEECH_DURATION", 0.1))
VAD_MIN_SILENCE_DURATION = float(os.environ.get("VAD_MAX_SILENCE_DURATION", 0.1))

NUM_THREADS = os.environ.get("NUM_THREADS", os.environ.get("OMP_NUM_THREADS"))
NUM_THREADS = int(NUM_THREADS)

try:
import faster_whisper

Expand Down Expand Up @@ -55,13 +52,15 @@
def set_num_threads(n):
# os.environ["OMP_NUM_THREADS"] = str(n)
pass
DEFAULT_NUM_THREADS = None
else:
import torch
DEFAULT_NUM_THREADS = torch.get_num_threads()
def set_num_threads(n):
torch.set_num_threads(n)

# Number of CPU threads
NUM_THREADS = os.environ.get("NUM_THREADS", os.environ.get("OMP_NUM_THREADS"))
if NUM_THREADS is None:
NUM_THREADS = DEFAULT_NUM_THREADS
if NUM_THREADS is not None:
Expand Down

0 comments on commit e0b6fea

Please sign in to comment.