Currently the pipeline includes the following steps:
-
Create a directory called
data
with a subdirectory for each of the source databases and download the input RDF data as follows: (manual)-
Schoenberg Database of Manuscripts (sdbm): https://sdbm.library.upenn.edu/downloads (you have to be logged in)
-
Medieval Manuscripts in Oxford Libraries (bodley): https://github.com/mapping-manuscript-migrations/bodleian-RDF
-
To combine files into
input.ttl
:for f in *.rdf; do (rapper $f -i rdfxml -o turtle) > $f.ttl; done;
cat *.ttl| rapper - "https://medieval.bodleian.ox.ac.uk/catalog/" -i turtle -o turtle > input.ttl
-
-
Bibale Database (bibale): http://bibale.irht.cnrs.fr/exports/mmm/
-
CSV of Bibale shelfmark city locations to
data/additional/bibale_locations.csv
-
CSV of manual manuscript links to
data/additional/manuscript_links.csv
-
CSV of Bibale/Bodley Phillipps numbers to
data/additional/phillipps_numbers.csv
-
CSVs of actor Recon runs to
data/additional/recon_actors_{LETTER}_{DATE}.csv
- where {LETTER} corresponds to actor name starting letter
- and {DATE} corresponds to date in format YYYY-MM-DD
- (e.g.,
recon_actors_A_2019-05-22.csv
)
-
CSVs of work Recon runs to
data/additional/recon_works_{LETTERS}_{DATE}.csv
- similarly, e.g.,
recon_works_J-P_2019_06_10.csv
- similarly, e.g.,
-
-
Set up input databases (automated)
- Load
data/sdbm/input.ttl
tohttp://localhost:3051/ds/sparql
- Load
data/bodley/input.ttl
tohttp://localhost:3052/ds/sparql
- Load
data/bibale/input.ttl
tohttp://localhost:3053/ds/sparql
- Load
-
Convert input datasets to unified data model and reconcile some of the contained entities (automated)
- Transform data using SPARQL CONSTRUCTs
- Link Bibale places to GeoNames (You'll need GeoNames API key(s) for this)
- You can add GeoNames API keys to
.env
in formatGEONAMES_KEY=<key>
GEONAMES_KEY2=<key 2>
GEONAMES_KEY3=<key 3>
... - Keys are throttled when the API temporal query limit is exhausted
- You can add GeoNames API keys to
- Reconcile all place references, fetch place information from TGN
- Link manuscripts and works that have shared identifiers or manual links
- Link actors that have shared identifiers or manual links
-
Deploy the final RDF files into SPARQL endpoint
http://localhost:3050/ds/sparql
./rebuild.sh
./deploy.sh
Avoid rebuilding the input Fuseki.
Build the images:
docker-compose build
Convert again and run:
./convert_again.sh
./rebuild.sh
./validate.sh
docker-compose logs -f
cd transform/src
GEONAMES_KEY=<APIKEY> nosetests -v --with-doctest
Replace <APIKEY>
with your GeoNames APIKEY.