You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This data set is a test set generated to test the capabilities of engines for Optical Character Recognition and Handwritten Text Recognition.
The data set consists of extracts of the minutes of the Swiss Federal Council. The single lines have been randomly chosen from about 150'000 pages of handwritten minutes.
For each line, an image file is being provided by the Swiss Federal Archives/Schweizerisches Bundesarchiv [images.tar.gz]. Please cite the images as follows: Excerpts of BAR E1004.1#1000/9#1-215. The images are in the public domain.
A PageXML file [page.zip] accompanies every image file and indicates the transcription and coordinates of the line.
Dataset modality
Mixed
Dataset licence
Creative Commons Attribution 4.0 International
Other licence
No response
How can you access this data
As a download from a repository/website
Confirm the dataset has an open licence
To the best of my knowledge, this dataset is accessible via an open licence
Contact details for data custodian
No response
The text was updated successfully, but these errors were encountered:
Thanks for the prod @davanstrien I wanted to dig into this but got involved in other things at a recent GLAM hackathon. The expectation is just to add a well-documented Dataset, like this one for example? Are there more specific instructions somewhere?
A URL for this dataset
https://doi.org/10.5281/zenodo.4746342
Dataset description
Dataset modality
Mixed
Dataset licence
Creative Commons Attribution 4.0 International
Other licence
No response
How can you access this data
As a download from a repository/website
Confirm the dataset has an open licence
Contact details for data custodian
No response
The text was updated successfully, but these errors were encountered: