Skip to content

Commit

Permalink
Add README files
Browse files Browse the repository at this point in the history
  • Loading branch information
paulmillar committed Oct 11, 2021
1 parent fb8658e commit 4e4d3c9
Show file tree
Hide file tree
Showing 2 changed files with 124 additions and 0 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
== ROR utils ==

The [Research Organization Registry (ROR)](https://ror.org/) is a
community-led project to maintain a database of research institutes.
This repository contains a collection of utilities for working with
the information maintained and provided by ROR.

The `bin` directory contains miscellaneous scripts.

The `map-to-rdf` directory contains configuration to map the
ROR-supplied JSON data to an equivalent set of RDF-triple assertions.
Further rdetails are available in that directory's README file.

111 changes: 111 additions & 0 deletions map-to-rdf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
= map-to-rdf

[Global Research Identifier Database (GRID)](https://www.grid.ac/)
maintained a database of research organisations and provided that
information in various formats, including JSON and RDF.

The [Research Organization Registry (ROR)](https://ror.org/) was
created as a community-led project to maintain this information,
independent of any commercial company.

Unfortunately, ROR (currently) provides the information only in a
[JSON format](https://ror.readme.io/docs/ror-data-structure). They do
not provide an RDF representation of that information, although there
there is an [open issue](ror-community/ror-api#113) to address this.

This directory contains material to map the ROR-provided JSON data
into a corresponding set of RDF triples.

== Information organisation

The information is expressed using the [GRID
ontology](https://grid.ac/ontology/grid-ontology-v1.rdf). This is
because, to a large extent, ROR inherited its database from GRID and
there may be existing users for whom this ontology is already
familiar.

The file `grid-ontology-v1.ttl` contains a description of the GRID
ontology, serialised using the turtle format. This is (hopefully)
easier to read than the RDF/XML serialisation that the is used in the
[original file](https://grid.ac/ontology/grid-ontology-v1.rdf).

=== Deviation from GRID RDF

The RDF triples use the ROR identifiers as the predicate's subject,
not the GRID identifier. The GRID identifiers are asserted using the
`grid:id` predicate, just as GRID did. This may provide sufficient
interoperability for existing user.

The address information is currently not converted. This may be added
later.

== How to map the JSON data.

The conversion from JSON to RDF uses [RDF Mapping Language
(RML)](https://rml.io/specs/rml/) to describe how the different parts
of the JSON data should be understood. RML is intended to become a
standard and is reasonably well documented.

=== Problems and work-arounds

The ROR data-dump contains many entries with a `wikipedia_url` entry
that contains the empty string (`""`). This does not work well with
RML and such values must first be converted to `null` values.

One way to achieve this work-around with the data-dump
`2021-09-23-ror-data.json` is the following command:

```console
paul@sprocket:~/ROR$ sed -ie 's/"wikipedia_url": "",/"wikipedia_url": null,/' 2021-09-23-ror-data.json
```

This command updates the file `2021-09-23-ror-data.json` and creates a
backup of the original contents as `2021-09-23-ror-data.jsone`.

=== Running the conversion

The following command illustrates how to convert a ROR data-dump
`2021-09-23-ror-data.json` into a corresponding set of RDF triples.
The conversion shown here is done using the [RMLMapper
processor](https://github.com/RMLio/rmlmapper-java), an open-source
RML processor.

```console
paul@sprocket:~/ROR$ java -jar rmlmapper-4.12.0-r360-all.jar -o output.ttl -s turtle -m mapping.ttl
paul@sprocket:~/ROR$
```

Just for comparison, it takes about 40 seconds to process a full
data-dump on my laptop.

Here is an example entry from the resulting output:

```turtle
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix grid: <http://www.grid.ac/ontology/> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix vivo: <http://vivoweb.org/ontology/core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://ror.org/01js2sh04> a grid:Facility, foaf:Organization;
grid:cityName "Hamburg";
grid:countryCode "DE";
grid:countryName "Germany";
grid:crossrefFunderId "501100001647";
grid:establishedYear "1959"^^xsd:gYear;
grid:hasChild <https://ror.org/02zmk8084>, <https://ror.org/04fme8709>;
grid:hasGeonamesCity <http://sws.geonames.org/2911298/>;
grid:hasParent <https://ror.org/0281dp749>;
grid:hasWikidataId <http://www.wikidata.org/entity/Q311801>, <http://www.wikidata.org/entity/Q39901428>;
grid:id "grid.7683.a";
grid:isni "0000 0004 0492 0453";
grid:wikipediaPage <http://en.wikipedia.org/wiki/DESY>;
rdfs:label "Deutsches Elektronen-Synchrotron DESY";
skos:prefLabel "Deutsches Elektronen-Synchrotron DESY";
foaf:homepage <http://www.desy.de/index_eng.html> .
```

0 comments on commit 4e4d3c9

Please sign in to comment.