Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset: early_printed_books_font_detection #45

Open
1 task done
davanstrien opened this issue Jul 11, 2022 · 2 comments
Open
1 task done

Add dataset: early_printed_books_font_detection #45

davanstrien opened this issue Jul 11, 2022 · 2 comments
Assignees
Labels
dataset Dataset to be added good first issue Good for newcomers

Comments

@davanstrien
Copy link
Collaborator

A URL for this dataset

https://zenodo.org/record/3366686

Dataset description

This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list: Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura.

This dataset offers an image classification dataset that has potential implications for other downstream tasks such as OCR recognition.

A related paper Dataset of Pages from Early Printed Books with Multiple Font Groups

Dataset modality

Image

Dataset licence

Creative Commons Attribution Non Commercial Share Alike 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

@davanstrien
Copy link
Collaborator Author

Whilst this dataset should be fairly easy to add to the datasets hub, it is quite large, so you should be aware of this.

@davanstrien
Copy link
Collaborator Author

#self-assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Dataset to be added good first issue Good for newcomers
Development

No branches or pull requests

1 participant