🎙 ASR

Authors: [email protected]

🎙 ASR

Automatic Speech Recognition

Code to implement Hugging Face (🤗) pipeline on Azure machines, transcribing .wav (converted from client's .amr)

TODO: items are throughout code & documentation

Client data challenges

South African accents are very thick → need for fine-tuning
Code-switching between English & other African languages
Some audio is completely inaudible
Some audio is completely in a different language - need for classification here

Solution chosen

Currently very good ASR Transformer based solutions are being open-sourced. For example Wav2Vec2, Whisper, with open-source competitions being held, resulting in open access to high quality models.

Best performing open source model tested on client data
- patrickvonplaten/hubert-xlarge-ls960-ft-4-gram
- TODO: Haven't tried current state-of- art: openai/whisper-large

▶️ Inference instructions

To transcribe a new batch of client data...

1️⃣ Connect to client SFTP, download necessary data locally, convert .amr to .wav using:

Cyber Duck is a stand-alone app for SFTP connection. Download files locally using SFTP
- TODO: automate client SFTPT → blob process, triggering conversion & inference when new data appears
prod/inference/env/setup_amr2wav.sh to set up environment

prod/inference/amr2wav.py to convert .amr → .wav

> git clone https://github.com/elucidate-ai/asr
> cd asr/prod/inference
> bash env/setup_inference2csv.sh
python inference2csv.py

Then upload to Azure blob storage

Storage Explore is a stand-alone app for interacting with Azure blob storage
mount/connection.cfg is connection config to current storage blob used

2️⃣ Create appropriate Azure GPU machine for inference

Azure ML Portal used to create machines (Compute > + New)

Note on GPU needed

Only following Azure ML machines will work for such large models:

1 x NVIDIA Tesla P100

1 x NVIDIA Tesla V100

3️⃣ Connect to terminal of the machine you just created & clone repo

VS Code Azure extensions make this easy

> git clone https://github.com/elucidate-ai/asr
> cd asr

4️⃣ Mount Azure storage blob using mount/mount_blob.py

See mount/mount_README.md for more information

5️⃣ Run production

See prod/prod_README.md for more details
First parse the .wav files into an index .csv using:
- prod/inference/env/setup_parse2csv.sh to set up environment
- prod/inference/parse2csv.py to parse input .wav files to .csv
Run inference to create output .csv of transcriptions
- prod/inference/env/setup_inference2csv.sh to set up environment
- prod/inference/inference2csv.py to transcribe .wav files to a bulk transcription .csv output using following command to allow process to run in background & create log logs/inference2csv.out:
```
> nohup python inference2csv.py > logs/inference2csv.out&
```
  TODO _Currently cuts out after 2058 interations. Seems blob unmounts at that point. As such I tried to do blob mounting best practices & mount at home mount/mount_blob_home.py; this does not help at still cuts off at 2058 interations 😓

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dev		dev
mount		mount
prod		prod
sample-data		sample-data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙 ASR

Client data challenges

Solution chosen

Best performing open source model tested on client data

Contents

Folder `README` links

▶️ Inference instructions

Useful links

About

Releases

Packages

Languages

neeland/asr

Folders and files

Latest commit

History

Repository files navigation

🎙 ASR

Client data challenges

Solution chosen

Best performing open source model tested on client data

Contents

Folder README links

▶️ Inference instructions

Useful links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Folder `README` links

Packages