- 📜 Papers: Ho et al. 2005
- pclouds/words-vi - Vietnamese vocabulary list by D Nguyen (2017)
resource
- UniversalDependencies/UD_Vietnamese - Vietnamese Universal Dependency Treebank by NT Luong et al. (2017)
python
- Vietnamese Treebank by VLSP (2010)
data
Corpus | Descriptions | Domain | Year | References |
---|---|---|---|---|
GENLEX-VIET SUBTLEX-VIET |
82.263.474 words, 164.526 HTML files 79.757.504 words, 13.349 subtitle documents |
News Movie subtitles |
2019 | paper |
Vietlex Corpus | 106.464.835 words, 129.781.089 tokens | website | ||
VietnameseWAC | 80.000.000 tokens, 4.000.000 sentences | 2012 | website |
- binhvq/news-corpus. 3.7 GB Compress. Uncompress ~18.6 GB. 111.274.300 sentences
- VNESEcopus.txt. 650 thousand sentences.
- VNTQcorpus(small).txt. 300 thousand sentences.
- VNTQcorpus(big).txt. 1.75 million sentences.
- Wikipedia Dump. 524 MB
Bilingual Corpus
- English-Vietnamese bilingual corpus by VLSP (2010)
data