-
Notifications
You must be signed in to change notification settings - Fork 1
Comparing OAI PMH and Backbone Importer for DAAN RDF pipeline
OAI-PMH and the Backbone importer are two potential solutions for the first step in the LOD pipeline, for making NISV data available as linked data. This first step requires RDF triples to be produced from DAAN data.
The OAI-PMH method for producing RDF triples from DAAN is implemented in a single project, the beng-lod-server (daan-oai-to-rdf branch). The backbone method for producing RDF triples from DAAN is implemented in two projects, the x-omgeving-backbone-rdf project (in GitLab) for creating RDF from an initial import and from updates, and the beng-lod-server project (daan-storage-api-to-rdf branch) for ad-hoc access to specific resources via the Storage API. (Note: should there be issues with using the Storage API, then an alternative way to get ad-hoc access to the data would be via a triple store.) For some actions an additional script would be required (e.g. to retrieve triples for a subset of resources). These scripts have not been implemented.
Both methods currently use the RDF schema as a basis for converting the data. In the future we will consider using RML for this conversion instead.
The triples produced by both methods can then be delivered to users, but that is a job for the later steps in the pipeline, it is not considered here.
Key concepts to include in the Linked data are:
- Hierarchy series/season/programme/scene description
- Title
- Genre
- Catalog
- Distribution channel
- Carrier type? (we exclude this for present due to low performance when including all carriers)
- Sort date
- Broadcaster
- Summary/Description
- Network
- Link to item online
- People (Executive/creator/personname…)
- Rights
- Locations (recording, museum, and just location)
- Keywords
Factor | OAI-PMH | Backbone | Comment |
---|---|---|---|
Ad-hoc access | Yes | Yes | |
Initial load | Yes, requires additional script to get IDs for retrieval (via OAI) | Yes | |
Retrieval of specific set of data (e.g. the Polygoon collection) | Yes, requires additional script to get IDs for retrieval | Yes, requires additional script to get IDs for retrieval | Can retrieve IDs from the OAI if want a date range, for other subsets of the data would need to query an ES index or a triple store |
Incremental updates | Yes, requires additional script to get IDs for retrieval (via OAI) | Yes | |
Upload to triple store | Yes | Yes | Both produce triples, methods for uploading these triples are then the same for either option (either updating the triple store directly or saving resources to turtle files that are then uploaded) |
Reconciliation of GTAA concepts | Yes | Yes | reconciliation is done as an integral part of the conversion to RDF, and is common to both methods |
Availability of core fields (see above) | All except link to online | All except link to online | Not sure if this field exists in DAAN |
Availability of subtitles | No | Yes | Questionable whether we are interested in subtitles in linked data |
Availability of rights information | Yes e.g. this program | Yes | I wonder if some or all fields may be removed when the OAI-PMH goes into production, as there are e.g. email addresses and telephone numbers in the rights note. I also wonder if we should even show such fields in the Media Suite |
Suitability for Media Suite LD | No, limited to publicly available metadata | Yes, same data as Media Suite | |
Suitability for public LOD | Yes | Yes, but data would have to be filtered | |
Performance | Ad-hoc access is fast enough for easy browsing | Ad-hoc access is fast enough for easy browsing. Processing of backbone messages takes about a second per message (calculated from time for a number of items divided by the number of items processed). | Loading to triple store is slow, but this is common to both |
Fit with x-omgeving | Stand-alone | beng-lod-server is stand-alone. Could combine backbone import with backbone import for Elastic Search - but is this desirable? | |
Ease of maintaining | Adds an additional data source to the x-omgeving. Core code is a single project. Requires additional scripts for retrieving subsets, initial loading and updates | Uses a data source that is already in use in the x-omgeving. Core code is split over two projects. Requires an additional script for retrieving subsets | |
Dependencies | Dependent on OAI-PMH | Dependent on Storage API and backbone pusher | Do we have any idea of how well these are going to be supported/maintained over coming years? What sort of capacity do they have? Are there any restrictions on the load we are allowed to place on them? |