-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset: royal_society_corpus #41
Comments
Some additional information on the annotation format here https://fedora.clarin-d.uni-saarland.de/rsc/annotation.html |
#self-assign |
@davanstrien, there seems to be a problem with the data
The I changed it to look like this:
and then it parses that part correctly |
I'll take a quick look at this today. One option might be to use a slightly cruder approach to parsing. I'll play around a bit and let you know how I get on with that. |
I was considering using a stack, but in the case of malformed data, the sentences would be wrong. |
A URL for this dataset
https://fedora.clarin-d.uni-saarland.de/rsc/
Dataset description
This offers an interesting dataset of text from the scientific domain across a long time period (1665-1869). Additionaly the dataset contains a range of annotations:
Dataset modality
Text
Dataset licence
Creative Commons Attribution Non Commercial Share Alike 4.0 International
Other licence
No response
How can you access this data
As a download from a repository/website
Confirm the dataset has an open licence
Contact details for data custodian
[email protected]
The text was updated successfully, but these errors were encountered: