Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset: [bnl_newspapers1841-1879] #92

Open
1 task done
ymaurer opened this issue Nov 14, 2022 · 1 comment
Open
1 task done

Add dataset: [bnl_newspapers1841-1879] #92

ymaurer opened this issue Nov 14, 2022 · 1 comment
Labels
candidate-dataset Proposed dataset to be added

Comments

@ymaurer
Copy link

ymaurer commented Nov 14, 2022

A URL for this dataset

https://data.bnl.lu/data/historical-newspapers/

Dataset description

630.709 articles from historical newspapers (1841-1879) along with metadata and the full text.

21 newspaper titles
24.415 newspaper issues
99.957 scanned pages
Transcribed using a variety of OCR engines and corrected using https://github.com/natliblux/nautilusocr (95% threshold)

The newspapers used are:

  • Der Arbeiter (1878)
  • L'Arlequin (1848-1848)
  • L'Avenir (1868-1871)
  • Courrier du Grand-Duché de Luxembourg (1844-1868)
  • Cäcilia (1863-1871)
  • Diekircher Wochenblatt (1841-1848)
  • Le Gratis luxembourgeois (1857-1858)
  • L'Indépendance luxembourgeoise (1871-1879)
  • Kirchlicher Anzeiger für die Diözese Luxemburg (1871-1879)
  • La Gazette du Grand-Duché de Luxembourg (1878)
  • Luxemburger Anzeiger (1856)
  • Luxemburger Bauernzeitung (1857)
  • Luxemburger Volks-Freund (1869-1876)
  • Luxemburger Wort (1848-1879)
  • Luxemburger Zeitung (1844-1845)
  • Luxemburger Zeitung = Journal de Luxembourg (1858-1859)
  • L'Union (1860-1871)
  • Das Vaterland (1869-1870)
  • Der Volksfreund (1848-1849)
  • Der Wächter an der Sauer (1849-1869)
  • D'Wäschfra (1868-1879)

Dataset modality

Text

Dataset licence

Creative Commons Public Domain Dedication and Certification

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

500MB-2GB

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

[email protected]

@ymaurer ymaurer added the candidate-dataset Proposed dataset to be added label Nov 14, 2022
@ymaurer
Copy link
Author

ymaurer commented Nov 14, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
candidate-dataset Proposed dataset to be added
Projects
None yet
Development

No branches or pull requests

1 participant