Add dataset: odeuropa_smell_objects #71

davanstrien · 2022-07-27T16:44:08Z

A URL for this dataset

Dataset description

From the Zenodo page:

This dataset is released as part of the Odeuropa project. The annotations are identical to the training set of the ICPR2022-ODOR Challenge.
It contains bounding box annotations for smell-active objects in historical artworks gathered from various digital connections.
The smell-active objects annotated in the dataset either carry smells themselves or hint at the presence of smells.
The dataset provides 15484 bounding boxes on 2116 artworks in 87 object categories.
An additional csv file contains further image-level metadata such as artist, collection, or year of creation.

Object detection datasets are time consuming to collect and there are relativlely few datasets for object detection that use LAM data. Those that do exist often use the output of one of the various YOLO models which may be of some interest but often includes categories which are unlikely to be particularly useful for research/curation of LAM collections. This dataset, in contrast, includes categories related to smell: a topic of interest to both art historians and social historians. As a result, this dataset offers a much richer exploration of the possibilities of using object detection with historical paintings.

Dataset modality

Image

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

Other

Confirm the dataset has an open licence

To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

davanstrien · 2022-07-27T16:44:56Z

Happy to help anyone who wants to work on this. I have a WIP loading script for another COCO formatted dataset: https://huggingface.co/datasets/biglam/nls_chapbook_illustrations

davanstrien · 2022-07-27T16:45:23Z

Also, I really want to call this dataset smelly_objects...

shamikbose · 2022-07-27T17:57:21Z

I'd love to work on this! Will be a good change from the text datasets so far.

shamikbose · 2022-07-27T17:57:53Z

#self-assign

davanstrien · 2022-07-27T18:10:37Z

Awesome, and don't worry if you can't finish this before you go away. It can wait until you're back too 🙂

shamikbose · 2022-07-27T20:45:36Z

Hopefully, I should be able to get it done. From the Zenodo page:

Due to licensing issues, we cannot provide the images directly, but instead provide a collection of links and a download script.

Should the dataset just contain the links to the images then?

davanstrien · 2022-07-27T21:20:56Z

Hopefully, I should be able to get it done. From the Zenodo page:

Due to licensing issues, we cannot provide the images directly, but instead provide a collection of links and a download script.

Should the dataset just contain the links to the images then?

Yes I think that would be best for this one. We can provide example code for downloading the images in the datacard.

shamikbose · 2022-07-31T21:05:59Z

@davanstrien This dataset has a lot of associated metadata

       ['File Name', 'Artist', 'Title', 'Query', 'Part', 'Earliest Date',
       'Latest Date', 'Margin Years', 'Genre', 'Material', 'Medium',
       'Height of Image Field', 'Width of Image Field', 'Type of Object',
       'Height of Object', 'Width of Object', 'Diameter of Object',
       'Position of Depiction on Object', 'Current Location',
       'Repository Number', 'Original Location', 'Original Place',
       'Original Position', 'Context', 'Place of Discovery',
       'Place of Manufacture', 'Associated Scenes', 'Object Categories',
       'Related Works of Art', 'Type of Similarity', 'Inscription',
       'Text Source', 'Bibliography', 'Photo Archive', 'Image URL',
       'Details URL', 'Additional Information']

Should they all be included in the dataset? Most of them are missing, from a cursory glance at the data. Current Location, Earliest Date, Latest Date, Genre, Material and Medium are populated for most of the images. I was thinking some of the fields like Material and Medium could be used for classification, maybe

davanstrien · 2022-08-01T10:02:05Z

@davanstrien This dataset has a lot of associated metadata
       ['File Name', 'Artist', 'Title', 'Query', 'Part', 'Earliest Date',
       'Latest Date', 'Margin Years', 'Genre', 'Material', 'Medium',
       'Height of Image Field', 'Width of Image Field', 'Type of Object',
       'Height of Object', 'Width of Object', 'Diameter of Object',
       'Position of Depiction on Object', 'Current Location',
       'Repository Number', 'Original Location', 'Original Place',
       'Original Position', 'Context', 'Place of Discovery',
       'Place of Manufacture', 'Associated Scenes', 'Object Categories',
       'Related Works of Art', 'Type of Similarity', 'Inscription',
       'Text Source', 'Bibliography', 'Photo Archive', 'Image URL',
       'Details URL', 'Additional Information']
Should they all be included in the dataset? Most of them are missing, from a cursory glance at the data. Current Location, Earliest Date, Latest Date, Genre, Material and Medium are populated for most of the images. I was thinking some of the fields like Material and Medium could be used for classification, maybe

My own feeling would be to include as much as possible. One option if things are often missing would be to put some of this metadata in an additional metadata column as a dictionary? This way it doesn't get lost but also is slightly less distracting than having a lot of columns with mostly missing data?

shamikbose · 2022-08-02T02:01:39Z

Yeah, I was building out the features as follows:

features = datasets.Features(
                {
                    "id": datasets.Value("string"),
                    "url": datasets.Value("string"),
                    "annotations": datasets.Value("string"),
                    "date": datasets.Value("string"),
                    "genre": datasets.Value("string"),
                    "material": datasets.Value("string"),
                    "metadata": {
                        "artist": datasets.Value("string"),
                        "query": datasets.Value("string"),
                        "title": datasets.Value("string"),
                        "height": datasets.Value("string"),
                        "width": datasets.Value("string"),
                    }
                }
            )

I'll probably get back to this in about two weeks, after I come back from vacation

davanstrien · 2022-08-02T08:36:03Z

I'll probably get back to this in about two weeks, after I come back from vacation

Have a great vacation!

shamikbose · 2022-09-10T19:37:12Z

@davanstrien I'm back to working on this dataset, but it seems like the URLs aren't accessible. Even the download script provided in the dataset gives the following error:
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Example from the first image in the metadata document:
URL: http://www.sigecweb.beniculturali.it/images/fullsize/ICCD50007114/ICCD4644613_SBAS%20RM%20223305.jpg

davanstrien · 2022-09-12T10:31:12Z

@shamikbose hey, hope you had a good break!

I'll try and take a look at this too but also tagging @kiymetakdemir who works on this project and might be able to help with this.

shamikbose · 2022-09-12T15:30:51Z

@davanstrien I did! It was a much needed break
Thanks for adding @kiymetakdemir. Hoping this data can still be accessed

kiymetakdemir · 2022-09-13T13:54:08Z

Hi @shamikbose, can you check it again? Now I tried to download the images with the given script but I haven't encountered any error, it downloaded successfully.

shamikbose · 2022-09-13T17:00:05Z

@kiymetakdemir I was able to download them today. Thanks!

shamikbose · 2022-09-15T16:09:14Z

@kiymetakdemir I get an error for this URL (http://134.76.24.240/download/07876601/flc0596164z_p?Expires=1610722060&Signature=SX15SE0B~KbZ7yvkTJtis1rsKysZddvhsxJzZSZ7oZoxqd~NNsKp22iYZGBQViGXMy7zwTDCYxu-Qan2O0aq2QxizENey~CF4WIV5-~bHwEZZjrmCoBdWDEeS0Y6XNajZ6DYzWQolxkiGWoqLs~Bw0j4GSrQef7QvgQciIWDlTE_&Key-Pair-Id=APKAJGHHKKX2FHRP63AQ) It's not accessible
Update: The links from www.sigecweb.beniculturali.it are timing out again

davanstrien added the candidate-dataset Proposed dataset to be added label Jul 27, 2022

davanstrien added dataset Dataset to be added and removed candidate-dataset Proposed dataset to be added labels Jul 27, 2022

bigscience-workshop-projects bot added this to BigLAM: BigScience Libraries, Archives and Museums Jul 27, 2022

bigscience-workshop-projects bot moved this to Todo in BigLAM: BigScience Libraries, Archives and Museums Jul 27, 2022

github-actions bot assigned shamikbose Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset: odeuropa_smell_objects #71

Add dataset: odeuropa_smell_objects #71

davanstrien commented Jul 27, 2022

davanstrien commented Jul 27, 2022

davanstrien commented Jul 27, 2022

shamikbose commented Jul 27, 2022 •

edited

Loading

shamikbose commented Jul 27, 2022

davanstrien commented Jul 27, 2022

shamikbose commented Jul 27, 2022

davanstrien commented Jul 27, 2022

shamikbose commented Jul 31, 2022

davanstrien commented Aug 1, 2022

shamikbose commented Aug 2, 2022

davanstrien commented Aug 2, 2022

shamikbose commented Sep 10, 2022 •

edited

Loading

davanstrien commented Sep 12, 2022

shamikbose commented Sep 12, 2022

kiymetakdemir commented Sep 13, 2022

shamikbose commented Sep 13, 2022

shamikbose commented Sep 15, 2022 •

edited

Loading

Add dataset: odeuropa_smell_objects #71

Add dataset: odeuropa_smell_objects #71

Comments

davanstrien commented Jul 27, 2022

A URL for this dataset

Dataset description

Dataset modality

Dataset licence

Other licence

How can you access this data

Confirm the dataset has an open licence

Contact details for data custodian

davanstrien commented Jul 27, 2022

davanstrien commented Jul 27, 2022

shamikbose commented Jul 27, 2022 • edited Loading

shamikbose commented Jul 27, 2022

davanstrien commented Jul 27, 2022

shamikbose commented Jul 27, 2022

davanstrien commented Jul 27, 2022

shamikbose commented Jul 31, 2022

davanstrien commented Aug 1, 2022

shamikbose commented Aug 2, 2022

davanstrien commented Aug 2, 2022

shamikbose commented Sep 10, 2022 • edited Loading

davanstrien commented Sep 12, 2022

shamikbose commented Sep 12, 2022

kiymetakdemir commented Sep 13, 2022

shamikbose commented Sep 13, 2022

shamikbose commented Sep 15, 2022 • edited Loading

shamikbose commented Jul 27, 2022 •

edited

Loading

shamikbose commented Sep 10, 2022 •

edited

Loading

shamikbose commented Sep 15, 2022 •

edited

Loading