Authors: [email protected]
Automatic Speech Recognition
Code to implement Hugging Face (🤗) pipeline
on Azure machines, transcribing .wav
(converted from client's .amr
)
TODO: items are throughout code & documentation
- South African accents are very thick → need for fine-tuning
- Code-switching between English & other African languages
- Some audio is completely inaudible
- Some audio is completely in a different language - need for classification here
-
Currently very good ASR Transformer based solutions are being open-sourced. For example Wav2Vec2, Whisper, with open-source competitions being held, resulting in open access to high quality models.
-
TODO: Haven't tried current state-of- art:
openai/whisper-large
Each folder has a markdown .md
file explaining each file in the folder
asr
|
├── dev : development
│
├── mount : mounted input files
│
├── prod : production process
│
└── README.md : >> you are here <<
Describes relevant folder's files
To transcribe a new batch of client data...
1️⃣ Connect to client SFTP, download necessary data locally, convert .amr
to .wav
using:
-
Cyber Duck is a stand-alone app for SFTP connection. Download files locally using SFTP
- TODO: automate client SFTPT → blob process, triggering conversion & inference when new data appears
-
prod/inference/env/setup_amr2wav.sh
to set up environment -
prod/inference/amr2wav.py
to convert.amr
→.wav
> git clone https://github.com/elucidate-ai/asr > cd asr/prod/inference > bash env/setup_inference2csv.sh python inference2csv.py
Then upload to Azure blob storage
-
Storage Explore is a stand-alone app for interacting with Azure blob storage
-
mount/connection.cfg
is connection config to current storage blob used
2️⃣ Create appropriate Azure GPU machine for inference
-
Azure ML Portal used to create machines (Compute > + New)
Note on GPU needed
Only following Azure ML machines will work for such large models:
1 x NVIDIA Tesla P100
1 x NVIDIA Tesla V100
3️⃣ Connect to terminal of the machine you just created & clone repo
- VS Code Azure extensions make this easy
> git clone https://github.com/elucidate-ai/asr > cd asr
4️⃣ Mount Azure storage blob using mount/mount_blob.py
- See
mount/mount_README.md
for more information
5️⃣ Run production
-
See
prod/prod_README.md
for more details -
First parse the
.wav
files into an index.csv
using:prod/inference/env/setup_parse2csv.sh
to set up environmentprod/inference/parse2csv.py
to parse input.wav
files to.csv
-
Run inference to create output
.csv
of transcriptionsprod/inference/env/setup_inference2csv.sh
to set up environmentprod/inference/inference2csv.py
to transcribe.wav
files to a bulk transcription.csv
output using following command to allow process to run in background & create loglogs/inference2csv.out
:TODO _Currently cuts out after 2058 interations. Seems blob unmounts at that point. As such I tried to do blob mounting best practices & mount at home> nohup python inference2csv.py > logs/inference2csv.out&
mount/mount_blob_home.py
; this does not help at still cuts off at 2058 interations 😓
-
Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers
-
Introducing Whisper current state-of-the-art ⚡️