Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset: clmet_3-1 #58

Open
1 task done
clancyoftheoverflow opened this issue Jul 14, 2022 · 5 comments
Open
1 task done

Add dataset: clmet_3-1 #58

clancyoftheoverflow opened this issue Jul 14, 2022 · 5 comments
Assignees
Labels
dataset Dataset to be added ready for review Issue ready to be reviewed by maintainers

Comments

@clancyoftheoverflow
Copy link
Member

clancyoftheoverflow commented Jul 14, 2022

A URL for this dataset

http://fedora.clarin-d.uni-saarland.de/clmet/clmet.html

Dataset description

The Corpus of Late Modern English Texts, version 3.1 (CLMET3.1) is a principled collection of public domain texts drawn from various online archiving projects. In total, the corpus contains some 34 million words of running text. It incorporates CLMET, CLMETEV, and CLMET3.0, and has been compiled following roughly the same principles, that is:

The corpus covers the period 1710–1920, divided into three 70-year sub-periods.
The texts making up the corpus have all been written by British and Irish authors who are native speakers of English.
The corpus never contains more than three texts by the same author.
The texts within each sub-period have been written by authors born within a correspondingly restricted sub-period.

Size: 34 million words

Annotation: PoS-tagged; genre.

Dataset modality

Text

Dataset licence

Creative Commons Attribution Non Commercial Share Alike 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

@clancyoftheoverflow clancyoftheoverflow added the candidate-dataset Proposed dataset to be added label Jul 14, 2022
@davanstrien
Copy link
Collaborator

This looks amazing!

@shamikbose
Copy link

#self-assign

@shamikbose
Copy link

#ready-for-review

@github-actions github-actions bot added the ready for review Issue ready to be reviewed by maintainers label Jul 17, 2022
@shamikbose
Copy link

@davanstrien
Copy link
Collaborator

@shamikbose thanks, I'll aim to review this today or tomorrow. @clancyoftheoverflow you probably know this dataset better than me, so feel free to also review it.

@davanstrien davanstrien self-assigned this Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Dataset to be added ready for review Issue ready to be reviewed by maintainers
Development

No branches or pull requests

3 participants