This project provides a lightweight DP.LA aggregation feed and a command-line interface for ingesting different bibliographic and metadata vocabularies like MODS, Dublin Core, and MARC into a [RDF triplestore][BL] as BIBFRAME 2.0 linked-data. This project is based KnowledgeLinks.io's Catalog Pull Platform using the RDF Framework and BIBCAT.
This project started as a pilot for the Colorado/Wyoming DP.LA service hub.
-
Clone or fork the project repository:
git clone https://github.com/KnowledgeLinks/dpla-service-hub.git
-
Initialize and update submodules
cd dpla-service-hub/ git submodule init git submodule update
-
Create an instance directory for configuration and custom RDF rules:
mkdir instance cd instance/ touch config.py
To configure dpla-service-hub, you'll need to add these minimum
variables in your config.py
file.
- SECRET_KEY - Random string of characters for seeding Flask
- BASE_URL - Base URL to use for IRI minting, defaults to http://bibcat.org/
Right now, the way to ingest records into the triplestore is open an interactive Python 3 session. Here is an example of setting-up your Python environment to use the these different types of source ingesters into the triplestore:
import sys
sys.path.append("/dpla-service-hub/bibcat")
from ingesters.ingester import NS_MGR, new_graph
To customize the field mappings, add common properties, and other
information to the triplestore, add Turtle RDF files in the custom
directory. When you then create an ingester, include the title of the
turtle file with the custom parameter to use your custom rules
during the ingestion period.
Create a MARC21 ingester using a custom RDF rules graph for Colorado College along with a sample of Colorado College's MARC 21 records:
import pymarc
import ingesters.marc as marc2bf
marc_ingester = marc2bf.MARCIngester(rules_ttl=['cc-marc-bf-.ttl'])
with open("dpla-service-hub/tmp/cc-marc.mrc", "rb") as fo:
reader = pymarc.MARCReader(fo, to_unicode=True)
for record in reader:
marc_ingester.transform(record=record)
import requests
import xml.etree.ElementTree as etree
import ingesters.mods as mods
mods_ingester = mods.MODSIngester(xml=mods_xml, rules_ttl=["cc-mods-bf.ttl"])
Request the MODS XML datafile from a Colorado College's Islandora repository for a single Fedora Object:
mods_result = request.get("https://digitalcc.coloradocollege.edu/islandora/object/coccc:26262/datastream/MODS/view")
mods_xml = etree.XML(mods_result.text)
mods_ingester.transform(source=mods_xml)
To test a random collection of Dublin Core RDF XML from Denver Public Library
import pickle
import pymarc
import ingesters.dc as dc
dc_ingester = dc.DCIngester(rules_ttlt st=['dpl-dc.ttl'])
with open("dpla-service-hub/tmp/sample_recs.pickle", "rb") as fo:
sample_recs = pickle.load(fo)
for rdf_record in sample_recs:
dc_ingester.transform(xml=etree.tostring(rdf_record))
dc_ingester.add_to_triplestore()
This project now supports Docker and Docker Compose. To run
the DPLA Service Hub stack, run docker-compose up
from the base directory. It will
build a bibcat image using the instance/config.py file you created