Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset: swiss_federal_council_handwritten_text_recognition #64

Open
1 task done
davanstrien opened this issue Jul 18, 2022 · 3 comments
Open
1 task done
Assignees
Labels
dataset Dataset to be added

Comments

@davanstrien
Copy link
Collaborator

A URL for this dataset

https://doi.org/10.5281/zenodo.4746342

Dataset description

This data set is a test set generated to test the capabilities of engines for Optical Character Recognition and Handwritten Text Recognition.
The data set consists of extracts of the minutes of the Swiss Federal Council. The single lines have been randomly chosen from about 150'000 pages of handwritten minutes.
For each line, an image file is being provided by the Swiss Federal Archives/Schweizerisches Bundesarchiv [images.tar.gz]. Please cite the images as follows: Excerpts of BAR E1004.1#1000/9#1-215. The images are in the public domain.
A PageXML file [page.zip] accompanies every image file and indicates the transcription and coordinates of the line.

Dataset modality

Mixed

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

@loleg
Copy link

loleg commented Nov 3, 2022

#self-assign

@davanstrien
Copy link
Collaborator Author

@loleg let me know if you need any help with this :)

@loleg
Copy link

loleg commented Nov 15, 2022

Thanks for the prod @davanstrien I wanted to dig into this but got involved in other things at a recent GLAM hackathon. The expectation is just to add a well-documented Dataset, like this one for example? Are there more specific instructions somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Dataset to be added
Development

No branches or pull requests

2 participants