Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset: odeuropa_benchmarks_and_corpora #54

Open
1 task done
davanstrien opened this issue Jul 14, 2022 · 1 comment
Open
1 task done

Add dataset: odeuropa_benchmarks_and_corpora #54

davanstrien opened this issue Jul 14, 2022 · 1 comment
Labels
candidate-dataset Proposed dataset to be added

Comments

@davanstrien
Copy link
Collaborator

A URL for this dataset

https://github.com/Odeuropa/benchmarks_and_corpora

Dataset description

This dataset

contains the annotations related to olfactory information from the benchmark created for the ODEUROPA project.
For 7 languages we selected a pool of documents covering different time periods (from 1620 to 1925) and topics (e.g. medicine, law, literature).

This offers an exciting dataset of annotations related to olfactory (smell) information in historical documents. The dataset is interesting because it covers a range of periods but also offers the possibility of utilising ml for a different task than standard entity recognition tasks.

Dataset modality

Text

Dataset licence

Other license

Other licence

No response

How can you access this data

As a download from a repository/website

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

@davanstrien davanstrien added the candidate-dataset Proposed dataset to be added label Jul 14, 2022
@davanstrien
Copy link
Collaborator Author

I am clarifying the licence for this, see Odeuropa/benchmarks_and_corpora#3 so would hold off working on this until we've got that info back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
candidate-dataset Proposed dataset to be added
Projects
None yet
Development

No branches or pull requests

1 participant