Russian articles datasets:
(Format: one .csv file per source per month)
File | Description | Dates | Articles |
---|---|---|---|
news-articles-2014.tar.bz2 | 40 top news sites and 20 fashion news sites | 2014-08 to 2014-12 | 500K |
news-articles-2015-part1.tar.bz2, news-articles-2015-part2.tar.bz2 | 40 top news sites and 20 fashion news sites | 2015-01 to 2015-11 | 1.5M |
lenta.tar.bz2 | Lenta.ru archives v0.2 (warning: URLs are mixed up) | 1999-09 to 2018-07 | 700K |
webhose-2016.tar.bz2 | Webhose.io sample date: 300 sources for one month | 2016-10 | 290K |
Other files:
File | Description |
---|---|
NER-Collection5.tar.gz , NER-Persons-1000.tar.gz | Russian NER datasets |
spelling.tar.gz | |
stress.tar.gz | Russian word stress dictionary |
word-frequencies.txt | Russian word frequencies |