Skip to content

Releases: buriy/russian-nlp-datasets

r4

19 Jul 15:07
7368d6e
Compare
Choose a tag to compare

Russian articles datasets:
(Format: one .csv file per source per month)

File Description Dates Articles
news-articles-2014.tar.bz2 40 top news sites and 20 fashion news sites 2014-08 to 2014-12 500K
news-articles-2015-part1.tar.bz2, news-articles-2015-part2.tar.bz2 40 top news sites and 20 fashion news sites 2015-01 to 2015-11 1.5M
lenta.tar.bz2 Lenta.ru archives v0.2 (warning: URLs are mixed up) 1999-09 to 2018-07 700K
webhose-2016.tar.bz2 Webhose.io sample date: 300 sources for one month 2016-10 290K

Other files:

File Description
NER-Collection5.tar.gz , NER-Persons-1000.tar.gz Russian NER datasets
spelling.tar.gz
stress.tar.gz Russian word stress dictionary
word-frequencies.txt Russian word frequencies