Skip to content

Commit

Permalink
Merge pull request #281 from togoid/main
Browse files Browse the repository at this point in the history
release 2024-12-09
  • Loading branch information
sh-ikeda authored Dec 9, 2024
2 parents b0b2687 + 49988a9 commit 2125944
Show file tree
Hide file tree
Showing 10 changed files with 5,277 additions and 4,152 deletions.
2 changes: 1 addition & 1 deletion config/dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -948,7 +948,7 @@ uniprot_proteome:
examples:
- ["UP000005640","UP000002311","UP000000589","UP000000803","UP000000437","UP000001819","UP000008028","UP000008143","UP000079169"]
method: zcat $TOGOID_ROOT/input/uniprot/uniprot_proteome.tab.gz | awk -F "\t" 'FNR>=2&&$1&&$4{print $1 "\t" $4}'
description: 'This is a proteome dataset defined by UniProt. UniProt assigns IDs to the set of proteins expressed by a single organism. Since multiple genomes are often sequenced for the same organism, and a proteome is defined for each genome, proteome IDs to distinguish individual proteomes from the same taxonomy identifier were introduced. [More info](https://www.uniprot.org/help/proteome_id)'
description: 'This is a proteome dataset defined by UniProt. UniProt assigns IDs to the set of proteins expressed by a single organism. Since multiple genomes are often sequenced for the same organism, and a proteome is defined for each genome, proteome IDs to distinguish individual proteomes from the same taxonomy identifier were introduced. More info: https://www.uniprot.org/help/proteome_id'
vgnc:
label: VGNC
catalog: nbdc02624
Expand Down
4 changes: 2 additions & 2 deletions docs/help.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# TogoID ver. 2.0
Datasets last updated: 2024-12-05
Datasets last updated: 2024-12-09

## About
- [TogoID](https://togoid.dbcls.jp/) is an ID conversion service implementing unique features with an intuitive web interface and an API for programmatic access. TogoID supports datasets from various biological categories such as gene, protein, chemical compound, pathway, disease, etc. TogoID users can perform exploratory multistep conversions to find a path among IDs. To guide the interpretation of biological meanings in the conversions, we crafted an [ontology](https://togoid.dbcls.jp/ontology) that defines the semantics of the dataset relations.
Expand All @@ -10,7 +10,7 @@ Datasets last updated: 2024-12-05
## Video tutorial
- [How to use TogoID: an exploratory ID converter to bridge biological datasets](https://youtu.be/gXnvm6Fn4R8)

## Statistics (as of 2024-12-05)
## Statistics (as of 2024-12-09)
- Number of target datasets
- 105 (from 73 databases)
- For details on the target DBs and ID examples, please refer to the "DATASETS" tab.
Expand Down
4 changes: 2 additions & 2 deletions docs/help_ja.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# TogoID ver. 2.0
Datasets last updated: 2024-12-05
Datasets last updated: 2024-12-09

## About
- [TogoID](https://togoid.dbcls.jp/) は、直感的なインターフェースにより生命科学系データベース(DB)間のつながりを探索的に確認しながらID変換を行うことができるウェブアプリケーションです。同一の実体を指すID間の変換だけでなく、関連する別のカテゴリーのIDへの変換も可能です。また、直接リンクされていないDBのID間でも、他のDBを経由した変換を探索することができます。
Expand All @@ -10,7 +10,7 @@ Datasets last updated: 2024-12-05
## 動画マニュアル
- [TogoIDを使って生命科学系データベースのさまざまなIDを探索的に変換する](https://youtu.be/gXnvm6Fn4R8)

## 統計 (2024-12-05)
## 統計 (2024-12-09)
- 対象データセット数
- 105 (73 のデータベースに由来)
- 対象DBの詳細やID例については、"DATASETS" タブ からご覧いただけます。
Expand Down
3 changes: 3 additions & 0 deletions docs/news.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# 2024-12-09
- Weekly update has been completed.

# 2024-12-05
- Weekly update has been completed.
- The dataset `uniprot_reference_proteome` has been renamed to `uniprot_proteome`, because this dataset includes proteomes other than the reference proteomes. We apologize for any inconvenience caused.
Expand Down
4 changes: 2 additions & 2 deletions lib/togoid-config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ def initialize(name, hash)
end

def setup_files
@ttl_dir = "output/id-label/"
@ttl_dir = "output/ttl/label"
@ttl_file = "#{@ttl_dir}/#{@name}.ttl"
FileUtils.mkdir_p(@ttl_dir)
end
Expand Down Expand Up @@ -205,7 +205,7 @@ def load_dataset

def setup_files
@tsv_dir = "output/tsv"
@ttl_dir = "output/ttl"
@ttl_dir = "output/ttl/relation"
@tsv_file = "#{@tsv_dir}/#{@source_ns}-#{@target_ns}.tsv"
@ttl_file = "#{@ttl_dir}/#{@source_ns}-#{@target_ns}.ttl"
FileUtils.mkdir_p(@tsv_dir)
Expand Down
6 changes: 3 additions & 3 deletions log/config-summary.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -248,10 +248,10 @@ uniprot-omim_gene nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbd
uniprot-omim_phenotype nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc00154 Phenotype OMIM phenotype http://identifiers.org/mim/ tio:TIO_000010 has related phenotype has related phenotype tio:TIO_000011 phenotype is related with is related with Monthly "sparql_thread.pl -t 10"
uniprot-orphanet_phenotype nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc01422 Phenotype Orphanet phenotype http://identifiers.org/orphanet.ordo/Orphanet_ tio:TIO_000010 has related phenotype has related phenotype tio:TIO_000011 phenotype is related with is related with Monthly "uniprot_idmapping2tsv.rb ${TOGOID_ROOT}/input/uniprot/idmapping.dat.gz Orphanet"
uniprot-pdb nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc00156 Structure PDB http://rdf.wwpdb.org/pdb/ tio:TIO_000026 protein has 3D structure has 3D structure tio:TIO_000027 is 3D structure of protein is 3D structure of Monthly "uniprot_idmapping2tsv.rb ${TOGOID_ROOT}/input/uniprot/idmapping.dat.gz PDB"
uniprot_proteome-assembly_insdc nbdc00221 Proteome UniProt proteome http://purl.uniprot.org/proteomes/ FIXME Genome Assembly INSDC http://identifiers.org/insdc.gca/ tio:TIO_000092 proteome is coded by genome is coded by tio:TIO_000093 genome codes proteome codes Monthly "gzip -dc ${TOGOID_ROOT}/input/uniprot/uniprot_proteome.tab.gz | sed -e '1d' | cut -f1,3 | grep GCA |cut -f1 -d '.'"
uniprot_proteome-assembly_refseq nbdc00221 Proteome UniProt proteome http://purl.uniprot.org/proteomes/ FIXME Genome Assembly RefSeq http://identifiers.org/insdc.gca/ tio:TIO_000092 proteome is coded by genome is coded by tio:TIO_000093 genome codes proteome codes Monthly "gzip -dc ${TOGOID_ROOT}/input/uniprot/uniprot_proteome.tab.gz | sed -e '1d' | cut -f1,3 | grep GCF |cut -f1 -d '.'"
uniprot_proteome-taxonomy nbdc00221 Proteome UniProt proteome http://purl.uniprot.org/proteomes/ nbdc00700 Organism Taxonomy http://identifiers.org/taxonomy/ tio:TIO_000090 is proteome of organism is proteome of tio:TIO_000091 organism has proteome has proteome Monthly "gzip -dc ${TOGOID_ROOT}/input/uniprot/uniprot_proteome.tab.gz | sed -e '1d' | cut -f 1,2"
uniprot-reactome_pathway nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc00185 Pathway Reactome pathway http://identifiers.org/reactome/ tio:TIO_000069 molecule participates in pathway participates in pathway tio:TIO_000068 pathway has participant molecule has participant molecule Monthly "uniprot_idmapping2tsv.rb ${TOGOID_ROOT}/input/uniprot/idmapping.dat.gz Reactome"
uniprot_reference_proteome-assembly_insdc nbdc00221 Proteome UniProt reference proteome http://purl.uniprot.org/proteomes/ FIXME Genome Assembly INSDC http://identifiers.org/insdc.gca/ tio:TIO_000092 proteome is coded by genome is coded by tio:TIO_000093 genome codes proteome codes Monthly "gzip -dc ${TOGOID_ROOT}/input/uniprot/uniprot_reference_proteome.tab.gz | sed -e '1d' | cut -f1,3 | grep GCA |cut -f1 -d '.'"
uniprot_reference_proteome-assembly_refseq nbdc00221 Proteome UniProt reference proteome http://purl.uniprot.org/proteomes/ FIXME Genome Assembly RefSeq http://identifiers.org/insdc.gca/ tio:TIO_000092 proteome is coded by genome is coded by tio:TIO_000093 genome codes proteome codes Monthly "gzip -dc ${TOGOID_ROOT}/input/uniprot/uniprot_reference_proteome.tab.gz | sed -e '1d' | cut -f1,3 | grep GCF |cut -f1 -d '.'"
uniprot_reference_proteome-taxonomy nbdc00221 Proteome UniProt reference proteome http://purl.uniprot.org/proteomes/ nbdc00700 Organism Taxonomy http://identifiers.org/taxonomy/ tio:TIO_000090 is proteome of organism is proteome of tio:TIO_000091 organism has proteome has proteome Monthly "gzip -dc ${TOGOID_ROOT}/input/uniprot/uniprot_reference_proteome.tab.gz | sed -e '1d' | cut -f 1,2"
uniprot-refseq_protein nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc00187 Protein RefSeq protein http://identifiers.org/refseq/ tio:TIO_000002 is equivalent to is equivalent to tio:TIO_000002 is equivalent to is equivalent to Monthly "uniprot_idmapping2tsv.rb ${TOGOID_ROOT}/input/uniprot/idmapping.dat.gz RefSeq | perl -pe 's/\\.\\d+$//;s/-\\d+\\t/\\t/'"
uniprot-taxonomy nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc00700 Organism Taxonomy http://identifiers.org/taxonomy/ tio:TIO_000124 is protein of organism is protein of organism tio:TIO_000125 organism has protein has protein Monthly "zcat ${TOGOID_ROOT}/input/uniprot/idmapping.dat.gz | awk -F \"\\t\" '$2==\"NCBI_TaxID\"{print $1 \"\\t\" $3}'"
uniprot-uniprot_mnemonic nbdc00221 Protein UniProt http://purl.uniprot.org/uniprot/ nbdc00221 Protein UniProt mnemonic http://purl.uniprot.org/uniprot/ tio:TIO_000022 has synonym has synonym tio:TIO_000023 is synonym of is synonym of Monthly "uniprot_idmapping2tsv.rb ${TOGOID_ROOT}/input/uniprot/idmapping.dat.gz UniProtKB-ID"
Expand Down
44 changes: 43 additions & 1 deletion log/error.log
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Error: check_remote_file_time(input/homologene/homologene.data, https://ftp.ncbi
Error: Remote file is empty
# Error: output/tsv/chembl_target-ensembl_gene.tsv new file size per old 0 / 61845 = 0.0 < 0.5
# Error: Failed to create output/tsv/chembl_target-ensembl_gene.tsv or created file was empty
Error: HTTP Error 500: Internal Server Error: https://api.alpha.glycosmos.org/partialmatch?wurcs=WURCS%3D2.0%2F4%2C10%2C9%2F%5Ba2122h-1b_1-5%5D%5Ba2112h-1b_1-5%5D%5BAad21122h-2a_2-6_5%2ANCC%2F3%3DO%5D%5Ba2112h-1b_1-5_2%2ANCC%2F3%3DO%5D%2F1-2-3-3-3-4-2-3-3-3%2Fa4-b1_b3-c2_b4-f1_c8-d2_d8-e2_f3-g1_f6-j2_g3-h2_h8-i2&rootnode=true G14091MR
Error: HTTP Error 500: Internal Server Error: https://api.alpha.glycosmos.org/partialmatch?wurcs=WURCS%3D2.0%2F4%2C11%2C10%2F%5Ba2122h-1b_1-5%5D%5Ba2112h-1b_1-5%5D%5Ba2122h-1b_1-5_2%2ANCC%2F3%3DO%5D%5Ba1221m-1a_1-5%5D%2F1-2-3-2-3-2-3-4-2-3-2%2Fa4-b1_b3-c1_c4-d1_d3-e1_e4-f1_f3-g1_g3-h1_g4-i1_i3-j1_j4-k1&rootnode=true G64227KZ
# Error: output/tsv/glycomotif-glytoucan.tsv new file size per old 0 / 2804382 = 0.0 < 0.5
# Error: Failed to create output/tsv/glycomotif-glytoucan.tsv or created file was empty
# Error: output/tsv/glytoucan-doid.tsv new file size per old 1538 / 4376 = 0.35146252285191953 < 0.5
Expand Down Expand Up @@ -63,4 +63,46 @@ Error: HTTP Error 500: Internal Server Error: https://api.alpha.glycosmos.org/pa
# Error: Failed to create output/tsv/mondo-omim_phenotype.tsv or created file was empty
# Error: output/tsv/mondo-orphanet_phenotype.tsv new file size per old 0 / 146923 = 0.0 < 0.5
# Error: Failed to create output/tsv/mondo-orphanet_phenotype.tsv or created file was empty
# Error: output/tsv/ncit_disease-ncit_tissue.tsv new file size per old 291 / 385314 = 0.0007552282034911786 < 0.5
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <html><head>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <title>408 Request Timeout</title>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML </head><body>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <h1>Request Timeout</h1>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <p>Server timeout waiting for the HTTP request from the client.</p>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <p>Additionally a 502 Bad Gateway
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML error was encountered while trying to use an ErrorDocument to handle the request.</p>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML </body></html>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <html><head>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <title>408 Request Timeout</title>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML </head><body>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <h1>Request Timeout</h1>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <p>Server timeout waiting for the HTTP request from the client.</p>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML <p>Additionally a 502 Bad Gateway
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML error was encountered while trying to use an ErrorDocument to handle the request.</p>
# Error: output/tsv/ncit_disease-ncit_tissue.tsv seems to contain HTML </body></html>
# Error: output/tsv/togovar-ensembl_transcript.tsv new file size per old 0 / 8069270901 = 0.0 < 0.5
# Error: Failed to create output/tsv/togovar-ensembl_transcript.tsv or created file was empty
# Error: output/tsv/togovar-ncbigene.tsv new file size per old 0 / 1097267175 = 0.0 < 0.5
# Error: Failed to create output/tsv/togovar-ncbigene.tsv or created file was empty
# Error: output/tsv/togovar-pubmed.tsv new file size per old 0 / 14408635 = 0.0 < 0.5
# Error: Failed to create output/tsv/togovar-pubmed.tsv or created file was empty
# Error: output/tsv/togovar-refseq_rna.tsv new file size per old 0 / 4180808277 = 0.0 < 0.5
# Error: Failed to create output/tsv/togovar-refseq_rna.tsv or created file was empty
# Error: output/tsv/uberon-ncit_tissue.tsv new file size per old 288 / 38722 = 0.007437632353700738 < 0.5
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <html><head>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <title>502 Proxy Error</title>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML </head><body>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <h1>Proxy Error</h1>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <p>The proxy server received an invalid
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML response from an upstream server.<br />
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML </body></html>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <html><head>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <title>502 Proxy Error</title>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML </head><body>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <h1>Proxy Error</h1>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML <p>The proxy server received an invalid
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML response from an upstream server.<br />
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
# Error: output/tsv/uberon-ncit_tissue.tsv seems to contain HTML </body></html>
# Error: output/tsv/wikipathways-uniprot.tsv new file size per old 60240 / 470542 = 0.12802257821830995 < 0.5
Loading

0 comments on commit 2125944

Please sign in to comment.