Add dataset: TexBiG #84

davanstrien · 2022-09-27T08:37:47Z

A URL for this dataset

Dataset description

TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. Further details can be found in the Paper.

Dataset modality

Mixed

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

10GB

Confirm the dataset has an open licence

To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

davanstrien added candidate-dataset Proposed dataset to be added dataset Dataset to be added and removed candidate-dataset Proposed dataset to be added labels Sep 27, 2022

bigscience-workshop-projects bot added this to BigLAM: BigScience Libraries, Archives and Museums Sep 27, 2022

bigscience-workshop-projects bot moved this to Todo in BigLAM: BigScience Libraries, Archives and Museums Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset: TexBiG #84

Add dataset: TexBiG #84

davanstrien commented Sep 27, 2022

Add dataset: TexBiG #84

Add dataset: TexBiG #84

Comments

davanstrien commented Sep 27, 2022

A URL for this dataset

Dataset description

Dataset modality

Dataset licence

Other licence

How can you access this data

size of dataset

Confirm the dataset has an open licence

Contact details for data custodian