Add dataset: swiss_federal_council_handwritten_text_recognition #64

davanstrien · 2022-07-18T12:32:45Z

A URL for this dataset

Dataset description

This data set is a test set generated to test the capabilities of engines for Optical Character Recognition and Handwritten Text Recognition.
The data set consists of extracts of the minutes of the Swiss Federal Council. The single lines have been randomly chosen from about 150'000 pages of handwritten minutes.
For each line, an image file is being provided by the Swiss Federal Archives/Schweizerisches Bundesarchiv [images.tar.gz]. Please cite the images as follows: Excerpts of BAR E1004.1#1000/9#1-215. The images are in the public domain.
A PageXML file [page.zip] accompanies every image file and indicates the transcription and coordinates of the line.

Dataset modality

Mixed

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

Confirm the dataset has an open licence

To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

loleg · 2022-11-03T19:06:11Z

#self-assign

davanstrien · 2022-11-11T10:51:36Z

@loleg let me know if you need any help with this :)

loleg · 2022-11-15T22:48:55Z

Thanks for the prod @davanstrien I wanted to dig into this but got involved in other things at a recent GLAM hackathon. The expectation is just to add a well-documented Dataset, like this one for example? Are there more specific instructions somewhere?

davanstrien added the candidate-dataset Proposed dataset to be added label Jul 18, 2022

davanstrien added dataset Dataset to be added and removed candidate-dataset Proposed dataset to be added labels Sep 16, 2022

bigscience-workshop-projects bot moved this to Todo in BigLAM: BigScience Libraries, Archives and Museums Sep 16, 2022

bigscience-workshop-projects bot added this to BigLAM: BigScience Libraries, Archives and Museums Sep 16, 2022

github-actions bot assigned loleg Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset: swiss_federal_council_handwritten_text_recognition #64

Add dataset: swiss_federal_council_handwritten_text_recognition #64

davanstrien commented Jul 18, 2022

loleg commented Nov 3, 2022

davanstrien commented Nov 11, 2022

loleg commented Nov 15, 2022

Add dataset: swiss_federal_council_handwritten_text_recognition #64

Add dataset: swiss_federal_council_handwritten_text_recognition #64

Comments

davanstrien commented Jul 18, 2022

A URL for this dataset

Dataset description

Dataset modality

Dataset licence

Other licence

How can you access this data

Confirm the dataset has an open licence

Contact details for data custodian

loleg commented Nov 3, 2022

davanstrien commented Nov 11, 2022

loleg commented Nov 15, 2022