arabic-speech-to-text

This repository contains the code for training the QuartzNet ASR model (NeMo) on the QCRI-AL Jazeera Corpus.

Data preprocessing

Download the QCRI-AL Jazeera Corpus. The script a_preprocess_xml.py extracts the text segments from the xml files. The script b_filter_ds.py removes segments that include latin script or numerals. The script c_split_ds.py creates a training set and a test set from the segments.

TODO

Upload pretrained model
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

arabic-speech-to-text

Data preprocessing

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

arabic-speech-to-text

Data preprocessing

TODO