Skip to content

Latest commit

 

History

History
217 lines (161 loc) · 5.29 KB

README.md

File metadata and controls

217 lines (161 loc) · 5.29 KB

TogoID config

Description of link data for TogoID.

Link diagram

Link data

Pair of database IDs in the tab separated value (TSV) format.

DB1ID1	DB2IDx
DB1ID2	DB2IDy
DB1ID3	DB2IDz
 :

Config

Metadata for pair of databases and their relation.

dataset.yaml

A list of source and target databases (the 1st and 2nd columns of the link data file, respectively).

# Dataset name (in snake_case) for TogoID which can be a subset of original database divided by the category.
ec:
  # Human readable label of the dataset (intended to be used in a Web UI)
  label: Enzyme nomenclature
  # Database identifier provided by the Integbio Database Catalog https://integbio.jp/dbcatalog/
  catalog: nbdc00019
  # Primary category of the database (should be chosen from the tags defined in the Integbio DB Catalog)
  category: Function
  # URI prefix (intended to be used as a URI prefix in RDF)
  prefix: http://identifiers.org/ec-code/
hgnc:
  label: HGNC
  catalog: nbdc01774
  category: Gene
  prefix: http://identifiers.org/hgnc/
pubchem_compound:
  label: PubChem compound
  catalog: nbdc00641
  category: Compound
  prefix: 'https://identifiers.org/pubchem.compound/'
pubchem_substance:
  label: PubChem substance
  catalog: nbdc00642
  category: Compound
  prefix: 'https://identifiers.org/pubchem.substance/'

Optional definition of the ID format can be included.

# Some datasets have ambiguous identifiers
go:
  label: Gene ontology
  catalog: nbdc00074
  category: Function
  # Regular expression can be used for automatic detection of the dataset from identifiers given by users.
  # If only a part of the user input should be recognized as an identifier, use a named capture to indicate the part.
  regex: '^(GO[:_])?(?<id>\d{7})$'
  # Identifier format stored in the TSV files (defined by the Handlebars notation with a named capture).
  internal_format: '{{id}}'
  # Identifier format used for export in the TogoID API (defined by the Handlebars notation with a named capture).
  external_format: 'GO:{{id}}'
  prefix: 'http://purl.obolibrary.org/obo/GO_'

config.yaml

Update procedure of link data and definitions of forward/reverse predicates for RDF generation.

# Relation of the pair of database identifiers (e.g., hgnc-ec)
link:
  # Forward link (source to target)
  forward:
    label: functionally related to
    namespace: ro
    # Ontology URI which defines predicates
    prefix: http://purl.obolibrary.org/obo/
    # Selected predicate defined in the above ontology
    predicate: RO_0002328

  # Reverse link (optional; target to source)
  reverse:
    label: gene product of
    namespace: ro
    prefix: http://purl.obolibrary.org/obo/
    predicate: RO_0002204

  # Example file name(s) of link data
  file: sample.tsv
#  file:
#    - sample1.tsv
#    - sample2.tsv

# Metadata for updating link data
update:
  # How often the source data is updated
  frequency: Bimonthly
  # Update procedure of link data (can be a script name or a command like)
  method: sparql_csv2tsv.sh query.rq "http://sparql.med2rdf.org/sparql"

Recommended to use Dublin Core's Frequency Vocabulary DCFreq terms to specify the update frequency.

Usage

Rakefile

To update and convert all files:

% rake >& `date +%F`.log

To update and convert all files in parallel:

% rake -m -j 4

To update all TSV files:

% rake update

To convert all TSV files into Turtle files:

% rake convert

To update a 'output/tsv/db1-db2.tsv' file:

% rake output/tsv/db1-db2.tsv

To obtain a 'output/ttl/db1-db2.ttl' file:

% rake output/ttl/db1-db2.ttl

togoid-config

To check the syntax of the config YAML file:

% ruby bin/togoid-config config/db1-db2 check

To update link data (output/tsv/db1-db2.tsv) from the data source:

% ruby bin/togoid-config config/db1-db2 update

To generate a RDF/Turtle file (output/ttl/db1-db2.ttl) for the given link data:

% ruby bin/togoid-config config/db1-db2 convert

togoid-config-summary

To summarize all config settings:

% ruby bin/togoid-config-summary config/*/config.yaml > config-summary.tsv
% vd config-summary.tsv

To see the database update frequency:

% ruby bin/togoid-config-summary config/*/config.yaml | cut -f1,18

To see the database update method:

% ruby bin/togoid-config-summary config/*/config.yaml | cut -f1,19

togoid-config-summary-dot

To visualize config relations:

% ruby bin/togoid-config-summary config/*/config.yaml | ruby bin/togoid-config-summary-dot > togoid.dot
% dot -Kdot -Ppng togoid.dot -otogoid.png
% open togoid.png

The option --id indicates to include identifiers of nodes (DBs) and edges (predicates).

% ruby bin/togoid-config-summary config/*/config.yaml | ruby bin/togoid-config-summary-dot --id > togoid.dot

Note that rdfs:seeAlso will be highlighted in red to encourage considering more informative predicates.

Also try some other visualization layouts and options:

% dot -Kcirco -Ppng togoid.dot -otogoid.png
% dot -Kfdp -Ppng togoid.dot -otogoid.png
% dot -Nshape=box -Nstyle=filled,rounded -Ecolor=gray -Kdot -Tpng togoid.dot -otogoid.png