This repository contains the code for training the QuartzNet ASR model (NeMo) on the QCRI-AL Jazeera Corpus.
Download the QCRI-AL Jazeera Corpus. The script a_preprocess_xml.py
extracts the text segments from the xml files.
The script b_filter_ds.py
removes segments that include latin script or numerals.
The script c_split_ds.py
creates a training set and a test set from the segments.
- Upload pretrained model
- ...