FAIR in a box is an offshoot of the original CDE-in-a-box created by Rajaram Kaliyaperumal. It differs primarily in the installation process (now fully automated) and adds the ability to do YARRRML-based transformations from CSV into RDF.
- NEW Update Alert Mailing List
- Installation requirements
- Downloading
- Installing
- Testing your installation
- Using your FAIR-in-a-Box
- Customizing your FAIR-in-a-Box
- Connecting your FAIR-in-a-Box to the Virtual Platform
- Implementing Beacon2 and other services
To use the FAIR-in-a-box solution you must
meet the following requirements.
User requirements (Person who is deploying this solution)
- Basic knowledge about Docker
- Basic GitHub knowledge
- (optional) Awareness of the EJP RD's CDE/CARE-SM semantic model if you plan to create FAIR data
System requirements (Machine where this solution is being deployed)
- Docker engine
- Docker-compose application
To get the FAIR in a box code clone this repository to your machine.
git clone https://github.com/ejp-rd-vp/FiaB
There are two versions of FiaB. One of them is compatible with Version 1 of the CDE models, the other is compatible with Version 2 CARE-SM models. Version 1 is deprecated and should no longer be used. The installation folder for the CARE-SM FiaB is CDE Version2 Models FiaB
NOTE THAT THE TWO VERSIONS ARE MUTUALLY INCOMPATIBLE! You cannot run them in parallel. They have different Docker components for the transformation and different YARRRML templates.
If you have already installed FiaB, it is possible to upgrade from CDE V1 to CARE-SM by changing the docker-compose file as follows:
FROM docker-compose CDE VERSION 1: remove the components:
- cde-box-daemon (version 0.3.2)
- yarrrml_transform
- rdfizer
TO UPGRADE to docker-compose CARE-SM: add the components (see sample below)
- cde-box-daemon (version 0.5.2)
- Add clause caresm
- Add clause yarrrml-rdfizer
Note: replace all instances of {PREFIX} with your local installation prefix, e.g. "ACME-default"
Note: replace {RDF_TRIGGER} with the port number that you have selected for your RDF transformation
cde-box-daemon:
image: markw/cde-box-daemon:0.5.2 # to use the version 2 CDE models with Hefesto
container_name: cde-box-daemon
environment:
GraphDB_User: ${GraphDB_User}
GraphDB_Pass: ${GraphDB_Pass}
baseURI: ${baseURI}
GRAPHDB_REPONAME: ${GRAPHDB_REPONAME}
depends_on:
- caresm
- yarrrml-rdfizer
ports:
- 127.0.0.1:{RDF_TRIGGER}:4567
volumes:
- ./data:/data
- ./config:/config
networks:
- {PREFIX}-default
caresm:
image: pabloalarconm/care-sm-toolkit:0.0.3
hostname: caresm
volumes:
- ./data:/code/data
networks:
- {PREFIX}-default
yarrrml-rdfizer:
image: markw/yarrrml-rml-ejp:0.0.3
container_name: yarrrml-rdfizer
hostname: yarrrml-rdfizer
environment:
- SERIALIZATION=nquads
volumes:
- ./data:/mnt/data
networks:
- {PREFIX}-default
You should now be able to restart your docker-compose and be fully functional. THERE IS NO NEED TO GO THROUGH THE "installing" section below! Your FiaB is installed, and upgraded.
If you have never installed FiaB before, you must
use the CARE-SM models - Version 1 models are deprecated!!!
Once you have completed the "Downloading" section of this tutorial, you can run run-me-to-install.sh
in the `./CARE-SM-Fiab/`` folder
bash ./run-me-to-install.sh
You will then get prompted as to whether you are doing a production installation (i.e. you have a GUID already created - for example, using W3ID) and you have already selected ports for your FDP, GraphDB, and Beacon (optional)). In addition, you must have an available port for the "RDFization trigger" - this port must be available on the server, but SHOULD NOT be exposed through the firewall.
If you say "no", the installer will install your FDP onto localhost using defaults:
- installation prefix 'test'
- port 7070 for the FDP
- port 7200 for the GraphDB
- port 4567 for the RDFization trigger
- port 8000 for Beacon2
If you say "yes", you will need to answer these questions yourself.
The installation prefix is simply a short-name for your database. NO SPACES, and better as lower-case letters. For example:
- crampdb
- euronmd
- dpp
- htad
- crag
- ACME <---- this will be used for the rest of the tutorial
This prefix is used to isolate one installation of FDP from another, if you are hosting multiple FDPs on the same server.
After about a minute, the installer will send a message to the screen asking you to check that the installation was successful. This message will last for 10 minutes, giving you enough time to explore the links in the message. After 10 minutes, the services will all automatically shut down. You can stop the installer by CTRL-C
at any time.
If the installation is successful using "test", you may then restart the run-me-to-install
, this time answering the questions using your production information.
The installer will create a folder containing all of your server configuration files. You can copy this folder anywhere on your system, e.g. to keep your servers all in one folder outside of your GitHub copy of FiaB.
The folder will be called "prefix-ready-to-go" (e.g. "ACME-ready-to-go"). Inside that folder is a customized docker-compose file (docker-compose-prefix.yml) for your deployment. So for example, you would issue the commands:
cp -r ACME-ready-to-go ~/SERVERS/
cd ~/SERVERS/ACME-ready-to-go
docker-compose -f docker-compose-ACME.yml up
Your FDP is now running at whatever port you selected for the FDP (default 7070)
When you are happy with your (production) installation, and you have created the metadata records (following the instructions below for creating a read/write user for the FDP and closing the default root account "[email protected]"), you are then ready to register yourself with the central index of FAIR Data Points.
To do this, you need to edit one file"
~/SERVERS/ACME-ready-to-go/fdp/application.yml
The line you need to edit is:
clientUrl: http://localhost:7070
Replace the http://localhost:7070
URL with your own production URL (note that you should NOT include a trailing slash!). The next time you docker-compose up, the system will register itself using the URL that you put as the value of clientUrl
NOTE: If you already have a reverse proxy on your server, then you should ignore this and use your own. This is ONLY for those who have not set-up a proxy.
NOTE: You can ONLY do this with a production installation! Your FDP URL must match your certificate!
- Uncomment the "hitch" service in the docker-compose file. NOTE: Hitch and Varnish are often used together... I found that Varnish has a frustrating habit of caching things you don't want to be cached... I no longer recommend that you use Varnish for this application. MDW
- If you need to (Hint: YOU PROBABLY DON'T!), you can edit the "frontend" line in the
ACME-ready-to-go/proxy.conf
file. If you leave it as-is, your FDP will run on https port 8443, which will generally be OK for all installations. - Edit the
ACME-ready-to-go/docker-compose-ACME.yml
"hitch" service configuration so that./combined.pem:/etc/hitch/cert.pem
is mapping YOUR certificate+key .pem file to the /etc/hitch/cert.pem inside the docker image (do not edit this filename!!) - You SHOULD now remove the exposed non-SSL port from the fdp_client service in the docker-compose file, as it is no longer needed
- NOTE There are situations where Hitch will cache an old copy of your certificate, casuing "expired certificate" errors in people's browsers. To fix this, docker-compose down and docker-compose up again. (This is incredibly frustrating... sorry! Not my fault! MDW)
Additional customization options are described below.
- If the GraphDB deployment is successful then you can access GraphDB by visiting the following URL.
Note: If you deploy the FAIR in a box
solution in your laptop then check only for local deployment url.
Service name | Local deployment | Production deployment |
---|---|---|
GraphDB | http://localhost:7200 | http://SERVER-IP:7200 |
By default the GraphDB service is secured so you need credentials to login to the graphDB. Please find the default graphDB's credentials in the table below.
Username | Password |
---|---|
admin |
root |
- If the FAIR Data Point deployment is successful then you can access the FAIR Data Point by visiting the following URL.
Service name | Local deployment | Production deployment |
---|---|---|
FAIR Data Point | http://localhost:7070 | http://SERVER-IP:7070 |
Note: If you deploy the FAIR in a box
solution in your laptop then check only for local deployment url.
In order to add content to the FAIR Data Point you need credentials with write access. Please find the default FAIR Data Point's credentials in the table below.
Username | Password |
---|---|
[email protected] |
password |
NOTE: The folders "metadata" and "bootstrap" are no longer needed. ALL ACTIVITIES FROM NOW ON HAPPEN INSIDE OF THE ACME-ready-to-go FOLDER, and this folder can be moved anywhere on your system.
In the folder ./ACME-ready-to-go there is a docker-compose-ACME.yml file, and two directories.
the folder structure is:
.--
| docker-compose-ACME.yml
| /data
---
| /triples
| /config
- The /data folder is where you will place your preCARE.csv file, according to the instructions provided above instructions on how to generate the (single!) 'preCARE.csv' file.
- The /data folder will contain the YARRRML Template that will be applied to the final CSV.
- NOTA BENE: Please execute
chmod a+w ./data/triples
prior to executing a transformation. The transformation tool in this container runs with very limited permissions, and cannot write to a folder that is mounted with default permissions. - NOTA BENE: the CARE_yarrrml_template.yaml contains a placeholder for your installation's base URI. This is passed as an environment variable
baseURI
, which appears in the cde-box-daemon clause of your docker-compose-ACME.yml file:
cde-box-daemon:
image: markw/cde-box-daemon:0.5.2 # to use the version 2 CDE models with Hefesto
container_name: cde-box-daemon
environment:
GraphDB_User: ${GraphDB_User}
GraphDB_Pass: ${GraphDB_Pass}
--> baseURI: ${baseURI}
GRAPHDB_REPONAME: ${GRAPHDB_REPONAME}
If you set baseURI to be "http://my.domain.org/" then all 'local' URLs in your resulting transformed data will use that as their prefix.
The EJP-RD CARE-SM Transformation process has three steps:
- A simple "preCARE" CSV file is created by the data owner (
you must do this!
) - The preCARE.csv is transformed into the final CARE.csv (
this is automated
) by the caresm toolkit (part of the docker-compose) - The final CARE.csv is processed by the YARRRML transformer, and RDF is output into the
./data/triples
folder
An exemplar preCARE CSV, and the standard YARRRML template, are provided for you to test your installation. Copy/paste these into the appropriate folder (./data
)
You are responsible for generating your own preCARE.csv
. NOTA BENE the filenames MUST NOT BE CHANGED! The files are called preCARE.csv
, and CARE_yarrrml_template.yaml
!! YOU CANNOT CHANGE THIS!
Step 1: Folder structure
Make sure the following folder structure, relative to where you plan to keep your pre and post-transformed data, is available:
./ACME-ready-to-go/data/
./ACME-ready-to-go/data/preCARE.csv
./ACME-ready-to-go/data/triples (this is where the output data will be written, and loaded from here into Graphdb)
./ACME-ready-to-go/data/CARE_yarrrml_template.yaml
Step 2: Edit the .env file
the .env file will create the values for the environment variables in the docker-compose file. The first of these baseURI
is the base for all URLs that represent your transformed data. This should be set to something like:
http://my.database.org/my_rd_data/
this will result in a Triple that looks like this:
<http://my.database.org/my_rd_data/person_123345_asdssaewe#ID> <sio:has-value> <"123345">
optimally, these URLs will resolve... but this is your responsibility - we cannot automate this.
Step 3: Input CSV files
Put an appropriately generated preCARE.csv
into the ACME-ready-to-go/data
.
If you are unsure which columns to fill for each data type, see the glossary
Step 4: Input YARRRML templates
The YARRRML
template is standardized
Step 5: Executing transformations
Call the url: http://localhost:4567 or http://SERVER-IP:4567 (or whatever 'trigger' port number you selected when you answered the installation questions) to trigger the transformation of each CSV file, and auto-load into graphDB (NOTA BENE this will over-write what is currrently loaded! i.e. the EJP pipeline can only be used to take snapshots, NOT incremental updates!)
Note: If you deploy FAIR in a box
solution in your laptop then check only for localhost url.
The image below gives an overview of software used in the FAIR in a box
solutions.
Triple store:
To store the rdf
documents generated by the FAIR in a box
solution we need to have a triplestore that stores these documents. In the FAIR in a box
solution we use graphDB as a triplestore. To know more about the graphDB triplestore please visit this link
FAIR Data Point:
To describe the content of your resource we need a metadata provider
component. For the FAIR in a box
solution we use FAIR Data Point
software that provides a description (metadata) of your resource. To learn more about the FAIR Data Point please visit this link
In this section, we list other related solutions.
MOLGENIS CDE in a box
MOLGENIS EDC provider also provides a complete set of CDE in a box
with EDC system. To learn more about MOLGENIS implementation of the CDE in a box
solution please visit this link
-
Go to http://localhost:7200 (or wherever you set the GraphDB port) and login with the default username and password ("admin"/"root").
-
Enter the "settings" for the admin account, and update the password. Note that this account will have access to both the metadata and the data (!!) so make the password strong!
-
at the terminal, shut down the system (docker-compose down)
-
go to the ./ACME-ready-to-go/fdp folder and edit the file "application-ACME.yml"
-
in the repository settings, update the username and password to whatever you selected above
-
now go back to the FAIR-ready-to-go folder and bring the system back up. Your FDP database is now protected with the new password.
- Go to http://localhost:7200 and login with the current username and password
- Enter the "settings" and "users".
- Create a new user and password, giving them read/write permission ONLY on the FDP database, and read-only permission on the CDE database.
- in the FAIR-ready-to-go folder, update the
.env
file with this new limited-permissions user - docker-compose down and up to restart the server
- Consider closing the GraphDB external port in the docker-compose, or limiting it to localhost... there's no need for GraphDB to be exposed!
- go to the
ACME-ready-to-go/fdp
folder - add your preferred logo file into the ./assets subfolder
- edit the ./variables.scss to point to that new logo file, and select its display size (or keep the default)
- to change the default colors, edit the first two lines to select the primary and secondary colors (the horizontal bar on the default http://localhost:7070 homepage shows the primary color on the left and the secondary color on the right)
- if you have a preferred favicon, replace the one in that folder with your preferred one.
- now go back to the ACME-ready-to-go folder and bring the docker-compose back up. Your FDP client will now be customized with your preferred icons and colors
Full instructions for modifying your default FAIR-in-a-box to match the schema requirements for the Virtual Platform can be found here: https://github.com/ejp-rd-vp/FDP-Configuration
To connect to the VP Index, you need to add the indexer "ping" function to your FAIR Data Point. To do this:
- Login to your FDP via the Web page
- Go to "settings"
- About halfway down the settings there is a "Ping" section. Add the following URL to the "Ping":
Once you have done this, your site will be indexed in the VP Index on the next "ping" cycle (should be weekly, by default). THE INDEX WILL LOOK FOR THE "VPDiscoverable" tag in the vpConnection property of whatever resource(s) you want to be indexed by the platform. e.g. if you have 5 datasets, but you only want 3 of them to be indexed by the VP, then you set the vpConnection property to "VPDiscoverable" for ONLY those three datasets (the others have no value for that property). In the metadata editor of the FDP web page, this is done via a dropdown menu.
If you want to force re-indexing, you can shut-down (docker-compose down) and restart your FDP. Alternatively, you can force a re-indexing by making the following curl
command:
curl -X POST https://index.vp.ejprarediseases.org/ -H "Content-Type: application/json" -d
{"clientUrl": "https://my.fdp.address.here/}
The process for advertising content-discovery services (i.e. "Level 2", "Level 3", etc.) has changed since Release 1.0 of the VP. In release 1.5, all access services are individually annotated for their functionality - the "VPContentDiscovery" flag has been deprecated. Note that the "VPDiscoverable" flag is still used, and should be set on any component of your FDP that you want the VP to pay attention to.
The mechanism for publishing a data service is to create a dcat:DataService in your FAIR Data Point, and then annotate it according to its function.
There are two "kinds" of DataService.
- "standalone" - the service does not serve a specific dataset in your FDP. A plotting library or a statistical calculation service would be examples of this.
- "dataset-dependent" - the service serves a dataset
In the FIAB installation, there are separate DataService classes for each of these cases. "standalone" is a child of dcat:Catalog. "dependent" is a child of dcat:distribution.
They are annotated as follows:
- "standalone" must have, at least, a dcat:landingPage. This is the URL to the website that describes the service. It may also have a dcat:endpointURL and dcat:endpointDescription, if it is an API.
- "dataset-dependent" must haves, at least, a dcat:endpointURL and dcat:endpointDescription, and may have a dcat:landingPage
- In all cases, there SHOULD be a dct:type property, with a value of one or more ontology terms that describe the functionality of that DataService. We recommend "rooting" these ontology terms into EDAM:opereation.
In the case of Beacon2 services, we have added two new ontology terms to the EJP VP ontology:
https://w3id.org/ejp-rd/vocabulary#VPBeacon2_individuals
This will be used to annotate a dcatDataService that implements the Beacon2 "individuals" endpoint.
https://w3id.org/ejp-rd/vocabulary#VPBeacon2_catalog
This will be used to annotate a dcatDataService that implements the Beacon2 "catalog" endpoint, and become the value(s) of dct:type for those DataServices