Skip to content

The Earth Metabolome Initiative Ontology. The Earth Metabolome Initiative (EMI, https://www.earthmetabolome.org/ ) is a global effort to profile the metabolic content of all currently known species on our planet.

License

Notifications You must be signed in to change notification settings

digital-botanical-gardens-initiative/earth_metabolome_ontology

Repository files navigation

Earth Metabolome Ontology

The latest and official version of the Earth Metabolome Initiative (EMI) ontology is available in emi.ttl that can replace the enpkg vocabulary, for example.

Any ontology issue, change or suggestion should be reported based on the emi.ttl file. The ontology documentation, other ontology files in the docs folder and ontop_config, and the emi_no_import.ttl file are generated based on the emi.ttl file. The emi_no_import.ttl is the same as emi.ttl without imported ontologies.

To open and edit the ontology, it can be done with a text editor or an ontology editor such as Protege.

For more details, see the EMI ontology documentation. The ontology documentation is fully generated with the WIDOCO tool. The WIDOCO-generated files are in the docs folder.

Natural Product taxonomy

The npc_taxonomy.ttl file is an SKOS-based OWL ontology for the structural classification of natural products derived from the NPClassifier tool. This OWL ontology was generated with the script in scripts.

For more details, see Natural Product Classifier vocabulary.

Example of a knowledge graph using the EMI ontology

A knowledge graph was generated based on the EMI ontology with the pf1600 dataset and structure metadata dataset sqlite. It contains more than 32 million triples and is accessible and downloadable via the SPARQL endpoint: https://biosoda.unil.ch/graphdb/sparql.

Tutorial to generate RDF triples based on the EMI ontology

Summary

  1. Introduction
  2. Allowing for insertion in mysql
  3. Inserting the sample data into a MySQL database
  4. Generating the EMI-based RDF graph
  5. Importing the generated RDF-based files in a triple store
  6. Interacting with the EMI virtual knowledge graph (VKG)

Introduction

In this tutorial, we will use a toy dataset and it requires mainly MySQL (version 8) and Ontop (version 5.1 or later).

  • Download the toy dataset from ENPKG full.
  • Download and install
    MySQL 8.2.
  • To check, if MySQL was correctly installed
mysql --version
cd ./scripts/sql_insert_emi_data
pipenv install
pip install pipenv --user
mysql -u root -p < ./scripts/sql_insert_emi_data/raw_mysql_schema.sql

NOTE: Optionally, if an emi_db already exists in your MySQL server and if you want to start from scratch (i.e., an empty database), you should drop it before running the raw_mysql_schema.sql script with the command above. Note that the data will be added in the database allowing duplicates. The command below will drop emi_db.

mysql -u root -p --execute="DROP DATABASE IF EXISTS emi_db ;"
  • You can connect to the database as shown below
mysql -u root -p
  • Check if the schema was created
show databases;
use emi_db;
show tables;

Alternatively, you can use the MYSQL Workbench to work with the emi_db database

mysql-workbench

NOTE: We observe that the structure_metadata (sqlite) is missing. Alternatively, you can consider to download an example from https://zenodo.org/records/12534675.

Allowing for insertion in mysql

mysql -u root -p
SHOW VARIABLES LIKE "local_infile";
SET GLOBAL local_infile = 1;
SHOW VARIABLES LIKE "local_infile";

Loading local data is now enabled. To check it, you can run:

mysql> SHOW VARIABLES LIKE "local_infile";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| local_infile  | ON    |
+---------------+-------+
1 row in set (0,01 sec)

Inserting the sample data into a MySQL database

  • Edit the scripts/sql_insert_emi_data/config.py file and make sure that the path are pointing to the correct files.

NOTE: To generate also a SKOS-based version of the Open Tree of Life download the tsv files from https://tree.opentreeoflife.org/about/taxonomy-version and include in the config.py the directory path to these files by replacing the None value with this path.

  • Run the command below to intiate the insertion in the emi_db database.
pipenv run python ./scripts/sql_insert_emi_data/main.py

NOTE: Alternatively, you can run python ./scripts/sql_insert_emi_data/main.py, if you have all dependencies listed in Pipfile installed in your python enviroment.

IMPORTANT: This tutorial was only tested with the Python 3.9 version, but it might work in any other 3.x version.

Generating the EMI-based RDF graph

jdbc.password=root
jdbc.user=root
jdbc.name=5e86f1b2-b7d8-4a17-9bc6-32b98b12ed79
jdbc.url=jdbc\:mysql\://localhost\:3306/emi_db
jdbc.driver=com.mysql.cj.jdbc.Driver
ontop.inferDefaultDatatype=True
  • Run the ontop command line tool with the command below in the current directory. Please refer to the right path to the ontop tool
PATH/TO/ontop-cli-5.1.1/ontop materialize -m ./ontop_config//emi-v0_2/emi-v0_2.obda -t ./ontop_config/emi-v0_2/emi-v0_2.ttl -p ./ontop_config/emi-v0_2/emi-v0_2.properties -f turtle --enable-annotations  --separate-files -o ./data/ontop

NOTE: you can allocated more memory to run ontop by editing the PATH/TO/ontop-cli-5.1.1/ontop file. For intance, ONTOP_JAVA_ARGS="-Xmx16g" instead of ONTOP_JAVA_ARGS="-Xmx1g" NOTE: If necessary you may need to specify the classpath for the mysql-connector-java .jar

export CLASSPATH=$CLASSPATH:/Applications/ontop-cli-5.1.1/lib/mysql-connector-java-8.2.0.jar

Importing the generated RDF-based files in a triple store

For GraphDB 10.6, see Loading data using importrdf.

For Stardog, see Adding data documentation section.

For Virtuoso, see Loading RDF data.

Interacting with the EMI virtual knowledge graph (VKG)

Ontop allow us to build vitual knowledge graphs. With its plugin for Protege, we can query the VKG for more information see the section Setting up the VKG using Ontop-Protégé.

NOTE: We recommend to download and use the Ontop+Protege 5.1.1. To build the VKG, you will also need all configuration files used to materialize the VKG in subsection Generating the EMI-based RDF graph, notably ./ontop_config/emi-v0_2/emi-v0_2.obda, ./ontop_config/emi-v0_2/emi-v0_2.ttl and ./ontop_config/emi-v0_2/emi-v0_2.properties.

A full tutorial about Ontop-Protégé is available at (https://doi.org/10.1016/j.patter.2021.100346).

About

The Earth Metabolome Initiative Ontology. The Earth Metabolome Initiative (EMI, https://www.earthmetabolome.org/ ) is a global effort to profile the metabolic content of all currently known species on our planet.

Topics

Resources

License

Stars

Watchers

Forks

Languages