In the ASR task, participants were asked to transcribe automatically Vietnamese audio files into the spoken word sequences. The committee provided the test set only, while the training data for the acoustic and language models was developed by the teams themselves.
The test set was composed of 796 continuous wav files of news speech for a total duration of two hours, without any information on the sentence segmentation. The speech was recorded in a non-noisy environment, and available in three dialects: Northern, Southern and Central with respectively proportion of 50%, 40% and 10%.
Model | Score | Paper/Source | Code | |
---|---|---|---|---|
WER | SER | |||
VAIS | 6.29 | 75.50 | Do et al. VLSP'18 | |
Viettel-CSC | 7.40 | 75.38 | Nguyen et al. VLSP'18 |
📜 Papers
- Nguyen et al. SoICT'18. Development of a Vietnamese Large Vocabulary Continuous Speech Recognition System under Noisy Conditions
- Nguyen et al. O-COCOSDA'17. Development of a Vietnamese speech recognition system for Viettel call center
💫 Libraries
- 2021, vietai/ASR - Vietnamese end-to-end speech recognition using wav2vec 2.0
💫 Services
- 2018, vtcc.ai ASR
- 2017, OpenFPT: Speech Recognition
📁 Dataset
- Vietnamese Speech Recognition Corpus- (Mobile)- 144 Speaker - 76.6 hours by Speechocean (2017)
data
$
- Vietnamese Speech Recognition Corpus-(In-Car)-300 Speakers - 305 hours by Speechocean (2017)
data
$
- Globalphone Vietnamese - 22.5 hours of read speech from 15 Vietnamese online newspapers by ELRA (2012)
data
$
- VIVOS - a free Vietnamese speech corpus consisting of 15 hours of recording speech by AILab (2017)
data
- FPT-30 -
data