Comparing OAI PMH and Backbone Importer for DAAN RDF pipeline

OAI-PMH and the Backbone importer are two potential solutions for the first step in the LOD pipeline, for making NISV data available as linked data. This first step requires RDF triples to be produced from DAAN data.

The OAI-PMH method for producing RDF triples from DAAN is implemented in a single project, the beng-lod-server (daan-oai-to-rdf branch). The backbone method for producing RDF triples from DAAN is implemented in two projects, the x-omgeving-backbone-rdf project (in GitLab) for creating RDF from an initial import and from updates, and the beng-lod-server project (daan-storage-api-to-rdf branch) for ad-hoc access to specific resources via the Storage API. (Note: should there be issues with using the Storage API, then an alternative way to get ad-hoc access to the data would be via a triple store.) For some actions an additional script would be required (e.g. to retrieve triples for a subset of resources). These scripts have not been implemented.

Both methods currently use the RDF schema as a basis for converting the data. In the future we will consider using RML for this conversion instead.

The triples produced by both methods can then be delivered to users, but that is a job for the later steps in the pipeline, it is not considered here.

Key concepts to include in the Linked data are:

Hierarchy series/season/programme/scene description
Title
Genre
Catalog
Distribution channel
Carrier type? (we exclude this for present due to low performance when including all carriers)
Sort date
Broadcaster
Summary/Description
Network
Link to item online
People (Executive/creator/personname…)
Rights
Locations (recording, museum, and just location)
Keywords

Factor	OAI-PMH	Backbone	Comment
Ad-hoc access	Yes	Yes
Initial load	Yes, requires additional script to get IDs for retrieval (via OAI)	Yes
Retrieval of specific set of data (e.g. the Polygoon collection)	Yes, requires additional script to get IDs for retrieval	Yes, requires additional script to get IDs for retrieval	Can retrieve IDs from the OAI if want a date range, for other subsets of the data would need to query an ES index or a triple store
Incremental updates	Yes, requires additional script to get IDs for retrieval (via OAI)	Yes
Upload to triple store	Yes	Yes	Both produce triples, methods for uploading these triples are then the same for either option (either updating the triple store directly or saving resources to turtle files that are then uploaded)
Reconciliation of GTAA concepts	Yes	Yes	reconciliation is done as an integral part of the conversion to RDF, and is common to both methods
Availability of core fields (see above)	All except link to online	All except link to online	Not sure if this field exists in DAAN
Availability of subtitles	No	Yes	Questionable whether we are interested in subtitles in linked data
Availability of rights information	Yes e.g. this program	Yes	I wonder if some or all fields may be removed when the OAI-PMH goes into production, as there are e.g. email addresses and telephone numbers in the rights note. I also wonder if we should even show such fields in the Media Suite
Suitability for Media Suite LD	No, limited to publicly available metadata	Yes, same data as Media Suite
Suitability for public LOD	Yes	Yes, but data would have to be filtered
Performance	Ad-hoc access is fast enough for easy browsing	Ad-hoc access is fast enough for easy browsing. Processing of backbone messages takes about a second per message (calculated from time for a number of items divided by the number of items processed).	Loading to triple store is slow, but this is common to both
Fit with x-omgeving	Stand-alone	beng-lod-server is stand-alone. Could combine backbone import with backbone import for Elastic Search - but is this desirable?
Ease of maintaining	Adds an additional data source to the x-omgeving. Core code is a single project. Requires additional scripts for retrieving subsets, initial loading and updates	Uses a data source that is already in use in the x-omgeving. Core code is split over two projects. Requires an additional script for retrieving subsets
Dependencies	Dependent on OAI-PMH	Dependent on Storage API and backbone pusher	Do we have any idea of how well these are going to be supported/maintained over coming years? What sort of capacity do they have? Are there any restrictions on the load we are allowed to place on them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing OAI PMH and Backbone Importer for DAAN RDF pipeline

Clone this wiki locally