This is a description of PubCaseFinder-RDF.
First set up your environment:
Make sure a proper JDK is installed, Java SE 1.8 or higher. Just a JRE isn't enough, since the project requires compilation.
※ Precautions for 'HGNC_custom.txt'.
You can create custom files on the https://www.genenames.org/download/custom/ website.
First, unselect everything. Then select the following information.
- Curated by the HGNC
- HGNC ID, Approved symbol
- Downloaded from external sources
- NCBI Gene ID(supplied by NCBI)
- Select status
- Approved
When you are done selecting, click the Submit button. If the created file looks like this, it's a success.
HGNC ID Approved symbol NCBI Gene ID(supplied by NCBI)
HGNC:5 A1BG 1
HGNC:37133 A1BG-AS1 503538
HGNC:24086 A1CF 29974
HGNC:7 A2M 2
HGNC:27057 A2M-AS1 144571
HGNC:23336 A2ML1 144568
HGNC:41022 A2ML1-AS1 100874108
HGNC:41523 A2ML1-AS2 106478979
HGNC:8 A2MP1 3
...
The script to use is here.
The following command outputs a file in Turtle format.
$ javac DiseaseGeneAssociation.java
$ java DiseaseGeneAssociation HGNC_custom.txt mondo.obo mim2gene_medgen.txt en_product6.xml gencc-submissions.csv
The out result file from the example run will at 'OMIM_Gene_Association.ttl' and 'Orphanet_Gene_Association.ttl'.
The output are written to the disk as 'OMIM_Gene_Association.ttl' file. They look like this
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ncbigene: <http://identifiers.org/ncbigene/>
PREFIX mim: <http://identifiers.org/mim/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sio: <http://semanticscience.org/resource/>
<https://pubcasefinder.dbcls.jp/gene_context/disease:OMIM:613320/gene:ENT:51025>
a sio:SIO_000983 ;
sio:SIO_000628 mim:613320, ncbigene:51025 ;
dcterms:source <ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/mim2gene_medgen> .
...
The output are written to the disk as 'Orphanet_Gene_Association.ttl' file. They look like this
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ncbigene: <http://identifiers.org/ncbigene/>
PREFIX ordo: <http://www.orpha.net/ORDO/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sio: <http://semanticscience.org/resource/>
<https://pubcasefinder.dbcls.jp/gene_context/disease:ORDO:178342/gene:ENT:1213>
a sio:SIO_000983 ;
sio:SIO_000628 ordo:Orphanet_178342, ncbigene:1213 ;
dcterms:source <http://www.orphadata.org/data/xml/en_product6.xml> .
...
$ javac DiseaseHpoAssociation.java
$ java DiseaseHpoAssociation phenotype.hpoa en_product4.xml
The out result file from the example run will at 'OMIM_HP_Association.ttl' and 'Orphanet_HP_Association.ttl'.
- Example run:
$ javac NCBI_HGNC.java
$ java NCBI_HGNC HGNC_custom.txt Homo_sapiens.gene_info mim2gene.txt NCBI_gene_summary.txt
- Output
- NCBI_HGNC.ttl
- Example run:
$ javac HP_Inheritance.java
$ java HP_Inheritance hp.obo HPO_id_ja.txt HPO_Inheritance_en_jp.txt
- Output
- HP_Inheritance.ttl
- Example run:
$ javac Disease.java
$ java Disease mim2gene.txt OMIM_id_ja.txt MedGen_HPO_OMIM_Mapping.txt mondo.obo NBKid_shortname_OMIM.txt UR_DBMS_DiseaseLinkOMIM.csv UR_DBMS_DiseaseLink.csv KEGG_disease.tsv
- Output
- OMIM.ttl
- Orphanet.ttl
- Jae-Moon Shin ([email protected])