-
Notifications
You must be signed in to change notification settings - Fork 3
Convert RDF data to relational databases
License
michaelbrunnbauer/rdf2rdb
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
RDF2RDB Version 0.53 by Michael Brunnbauer, netEstate GmbH This tool converts RDF data in several formats to a MySQL database. Project homepage: http://www.netestate.de/De/Loesungen/RDF2RDB You can contact me at [email protected] Quick start ----------- -Make sure that Python 2 is installed: http://www.python.org/ The tool was tested with Python 2.6. If you have troubles with newer 2.x versions contact us. -Install the MySQL module for Python: http://sourceforge.net/projects/mysql-python -Install the RDFLib module for Python: https://pypi.python.org/pypi/rdflib Create a fresh MySQL database with default character set = utf8 and adjust database credentials at the beginning of settings.py. Then run rdf2rdb.py with one or several files or URLs containing RDF data in a format supported by RDFLib: ./rdf2rdb.py http://www.w3.org/People/Berners-Lee/card.rdf ./rdf2rdb.py http://xmlns.com/foaf/spec/index.rdf http://www.w3.org/People/Berners-Lee/card.rdf You can add more data to the database by calling rdf2rdb again. It uses the file rdf2rdb.pickle to save state information between runs. Tables and columns may be renamed or dropped in incremental runs due to the additional information that is considered and to avoid name collisions. Call ./rdf2rdb without options to see available command line options. If you have installed a plugin parser for RDFlib and are working with local files, you may have to add the filename extension in settings.py (filenameextensions). Database structure ------------------ For every class a thing belongs to, a table with the class name is created: <class name> The table contains at least these fields: <class name>_id (The primary key, an integer) uri (The URI of that thing) Functional properties with a datatype are stored as columns in these tables: <property name> (The value of the property for the thing, NULL if not known) Functional properties are properties with only one value for each thing. Datatype properties connect things with data values (string, integer, date, etc). Object properties connect things with other things. For object properties, a table is created for every class combination seen with this property: <class name 1>_<property name>_<class name 2> The table has the following fields: <class name 1>_id1 (The primary key of the thing in table <class name 1> <class name 2>_id1 (The primary key of the thing in table <class name 2> For non functional datatype properties, a table is created for every class/datatype combination seen with this property: <class name>_<property name>_<datatype> <datatype> is the internal datatype name ('string','int','boolean', etc.). settings.py defines a mapping from datatype URIs (like "http://www.w3.org/2001/XMLSchema#integer") to the internal datatype name and from the internal datatype name to the MySQL datatype and an optional conversion function. The table for a non functional datatype property has the following fields: <class name>_id (The primary key of the thing in table <class name> <property name> (The value of the property for the thing) If a thing has several classes, it will be be contained in all corresponding class tables. The information about this thing specified via properties will be replicated in all corresponding class and property tables unless a domain or range has been specified for the property in the RDF data. In RDF, classes and properties are identified with URIs. The information what names have been chosen for them in the database is stored in the table labels: uri (The uri of the class/property) dblabel (The name of the class/property in this database) Also, for quick lookups of classes and primary keys of a thing, the table uris is created: uri (The uri of the thing) class (The class name of the thing in the database) id (The primary key of the thing in this table) By default, rdfs:label and rdfs:comment are considered functional properties. You can change this behavior in settings.py Every datatype property is created as functional property in the class tables and converted to a nonfunctional property with an extra table if a second value for the property with the same datatype is seen in the data and the property was not declared as functional. This conversion only affects class/datatype combinations for which second values have been seen so a property can be "functional" for one class/datatype and "non functional" for another. If several different datatypes are seen for a functional property, the columns in the class tables are named <property name>_<datatype> instead of <property name>. This deviance from the correct semantics of funtional and non functional was made in consideration of messy data. Non functional datatype properties are converted back to functional with a random value for each datatype chosen if the property is declared functional in the RDF data later. Entailment ---------- All entailments of the following properties should be generated: rdfs:subClassOf rdfs:subPropertyOf rdfs:domain rdfs:range owl:equivalentClass owl:equivalentProperty owl:FunctionalProperty (currently supported only for datatype properties) If you don't need this, you can use the command line option -n to disable entailments for better performance. Things may also be faster if you parse ontology files first. owl:Thing --------- Things for which a class is not yet known are assigned the class owl:Thing. The tables corresponding to owl:Thing are not very useful for a database user but are needed for incremental runs. After every run, things that got stored initially there but got another class later are removed from these tables and all empty tables are removed. You can drop all tables related to owl:Thing after a run by specifing the command line option -r. Be aware that information may be lost that could be used in later incremental runs. T-Box data ---------- If the tool recognises that information is about properties and classes, it creates no tables in the database for it. This behaviour can be disabled with the -t command line switch. Language tags ------------- Multilingual database schemas are currently not supported. You can specify which language tags you consider relevant in settings.py (triples with other language tags will be dropped). The language information is not represented in the database.
About
Convert RDF data to relational databases
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published