Skip to content

Commit

Permalink
Merge pull request #193 from togoid/main
Browse files Browse the repository at this point in the history
release 2023-10-16
  • Loading branch information
sh-ikeda authored Oct 16, 2023
2 parents 41abe73 + e0ea2c3 commit ef2caf2
Show file tree
Hide file tree
Showing 22 changed files with 2,818 additions and 2,080 deletions.
20 changes: 19 additions & 1 deletion Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,8 @@ module TogoID
return "prepare:ncbigene"
when /#{OUTPUT_TSV_DIR}oma_protein/
return "prepare:oma_protein"
when /#{OUTPUT_TSV_DIR}prosite/
return "prepare:prosite"
when /#{OUTPUT_TSV_DIR}reactome/
return "prepare:reactome"
when /#{OUTPUT_TSV_DIR}refseq_protein/
Expand Down Expand Up @@ -473,7 +475,7 @@ end

namespace :prepare do
desc "Prepare all"
task :all => [ :bioproject, :cellosaurus, :ensembl, :hmdb, :homologene, :hp_phenotype, :cog, :interpro, :ncbigene, :oma_protein, :reactome, :refseq_protein, :refseq_rna, :rhea, :sra, :swisslipids, :uniprot, :taxonomy ]
task :all => [ :bioproject, :cellosaurus, :ensembl, :hmdb, :homologene, :hp_phenotype, :cog, :interpro, :ncbigene, :oma_protein, :prosite, :reactome, :refseq_protein, :refseq_rna, :rhea, :sra, :swisslipids, :uniprot, :taxonomy ]

directory INPUT_DRUGBANK_DIR = "input/drugbank"
directory INPUT_BIOPROJECT_DIR = "input/bioproject"
Expand All @@ -486,6 +488,7 @@ namespace :prepare do
directory INPUT_INTERPRO_DIR = "input/interpro"
directory INPUT_NCBIGENE_DIR = "input/ncbigene"
directory INPUT_OMA_PROTEIN_DIR = "input/oma_protein"
directory INPUT_PROSITE_DIR = "input/prosite"
directory INPUT_REACTOME_DIR = "input/reactome"
directory INPUT_REFSEQ_PROTEIN_DIR = "input/refseq_protein"
directory INPUT_REFSEQ_RNA_DIR = "input/refseq_rna"
Expand Down Expand Up @@ -724,6 +727,21 @@ namespace :prepare do
end
end

desc "Prepare required files for PROSITE"
task :prosite => INPUT_PROSITE_DIR do
$stderr.puts "## Prepare input files for PROSITE"
download_lock(INPUT_PROSITE_DIR) do
updated = false
input_file = "#{INPUT_PROSITE_DIR}/prosite.dat"
input_url = "https://ftp.expasy.org/databases/prosite/prosite.dat"
if update_input_file?(input_file, input_url)
download_file(INPUT_PROSITE_DIR, input_url)
updated = true
end
updated
end
end

desc "Prepare required files for Reactome"
task :reactome => INPUT_REACTOME_DIR do
$stderr.puts "## Prepare input files for Reactome"
Expand Down
28 changes: 28 additions & 0 deletions bin/prosite_prorule.awk
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
BEGIN {
RS="//\n";
FS="\n";
ac = "";
pr = "";
OFS="\t";
}

{
for (i=1; i<=NF; i++) {
if ($i ~ /^AC/) {
split($i, ac_arr, " ");
ac = ac_arr[2];
gsub(/;$/, "", ac); # 行末のセミコロンを削除
}
if ($i ~ /^PR/) {
split($i, pr_arr, " ");
split(pr_arr[2], pr_ids, ";"); # セミコロンで複数のIDを区切る
for (j=1; j<=length(pr_ids); j++) {
gsub(/^ +| +$/, "", pr_ids[j]); # 余分な空白を削除
if (ac && pr_ids[j]) {
print ac, pr_ids[j];
}
}
ac = "";
}
}
}
24 changes: 24 additions & 0 deletions config/dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,22 @@ pfam:
examples:
- ["PF03719","PF03726","PF02809","PF00105","PF00250","PF00412","PF01412","PF00459","PF02532","PF01226"]
method: awk -F "\t" 'FNR==NR&&$2=="name"{a[$1]=$3}FNR!=NR&&$2=="PFAM"{print $3 "\t" a[$1]}' $TOGOID_ROOT/input/interpro/interpro.tsv $TOGOID_ROOT/input/interpro/interpro.tsv
prosite:
label: PROSITE
catalog: nbdc00241
category: Domain
regex: '^(?<id>PS\d{5})$'
prefix: 'http://identifiers.org/prosite/'
examples:
- ["PS01177","PS00516","PS01065","PS00418","PS01085","PS01086","PS01293","PS00355","PS01253","PS00992"]
prosite_prorule:
label: PROSITE ProRule
catalog: nbdc00241
category: AnnotationRule
regex: '^(?<id>PRU\d{5})$'
prefix: 'https://prosite.expasy.org/rule/'
examples:
- ["PRU00498","PRU00672","PRU00673","PRU00293","PRU00499","PRU10142","PRU10001","PRU10002","PRU10004","PRU10005"]
pubchem_compound:
label: PubChem compound
catalog: nbdc00641
Expand Down Expand Up @@ -768,6 +784,14 @@ sgd:
prefix: 'http://identifiers.org/sgd/'
examples:
- ["S000003096","S000003789","S000005770","S000028467","S000006377","S000001180","S000003769","S000001459","S000004428","S000006021"]
smart:
label: SMART
catalog: nbdc00682
category: Domain
regex: '^(?<id>SM\d{5})$'
prefix: 'http://identifiers.org/smart/'
examples:
- ["SM00130","SM00239","SM00043","SM00091","SM00104","SM01001","SM00135","SM00281","SM00996","SM00015"]
sra_accession:
label: SRA accession
catalog: nbdc00687
Expand Down
7 changes: 7 additions & 0 deletions config/interpro-prosite/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
link:
forward: TIO_000001
reverse: TIO_000001
file: sample.tsv
update:
frequency: Bimonthly
method: awk -F '\t' -v db=PROSITE '$2 == db {print $1 "\t" $3}' $TOGOID_ROOT/input/interpro/interpro.tsv
10 changes: 10 additions & 0 deletions config/interpro-prosite/sample.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
IPR000020 PS01177
IPR000035 PS00516
IPR000049 PS01065
IPR000052 PS00418
IPR000056 PS01085
IPR000056 PS01086
IPR000059 PS01293
IPR000079 PS00355
IPR000083 PS01253
IPR000096 PS00992
7 changes: 7 additions & 0 deletions config/interpro-smart/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
link:
forward: TIO_000001
reverse: TIO_000001
file: sample.tsv
update:
frequency: Bimonthly
method: awk -F '\t' -v db=SMART '$2 == db {print $1 "\t" $3}' $TOGOID_ROOT/input/interpro/interpro.tsv
10 changes: 10 additions & 0 deletions config/interpro-smart/sample.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
IPR000001 SM00130
IPR000008 SM00239
IPR000010 SM00043
IPR000014 SM00091
IPR000020 SM00104
IPR000031 SM01001
IPR000033 SM00135
IPR000034 SM00281
IPR000043 SM00996
IPR000048 SM00015
7 changes: 7 additions & 0 deletions config/prosite-prosite_prorule/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
link:
forward: TIO_000114
reverse: TIO_000115
file: sample.tsv
update:
frequency: Bimonthly
method: awk -f $TOGOID_ROOT/bin/prosite_prorule.awk $TOGOID_ROOT/input/prosite/prosite.dat
10 changes: 10 additions & 0 deletions config/prosite-prosite_prorule/sample.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
PS00001 PRU00498
PS00005 PRU00672
PS00006 PRU00673
PS00016 PRU00293
PS00017 PRU00499
PS00018 PRU10142
PS00061 PRU10001
PS00064 PRU10002
PS00068 PRU10004
PS00069 PRU10005
6 changes: 3 additions & 3 deletions docs/help.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# TogoID ver. 1.1
Datasets last updated: 2023/10/10
Datasets last updated: 2023/10/16

## About
- [TogoID](https://togoid.dbcls.jp/) is an ID conversion service implementing unique features with an intuitive web interface and an API for programmatic access. TogoID currently supports 89 datasets covering various biological categories. TogoID users can perform exploratory multistep conversions to find a path among IDs. To guide the interpretation of biological meanings in the conversions, we crafted an [ontology](https://togoid.dbcls.jp/ontology) that defines the semantics of the dataset relations.
Expand All @@ -22,9 +22,9 @@ Shuya Ikeda, Hiromasa Ono, Tazro Ohta, Hirokazu Chiba, Yuki Naito, Yuki Moriya,

- [API Documentation (Swagger)](https://togoid.dbcls.jp/apidoc/)

## Statistics (as of 2023/10/10)
## Statistics (as of 2023/10/16)
- Number of target datasets
- 96 (from 69 databases)
- 99 (from 71 databases)
- For details on the target DBs and ID examples, please refer to the "DATASETS" tab.

## Web user interface
Expand Down
6 changes: 3 additions & 3 deletions docs/help_ja.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# TogoID ver. 1.1
Datasets last updated: 2023/10/10
Datasets last updated: 2023/10/16

## About
- [TogoID](https://togoid.dbcls.jp/) は、直感的なインターフェースにより生命科学系データベース(DB)間のつながりを探索的に確認しながらID変換を行うことができるウェブアプリケーションです。同一の実体を指すID間の変換だけでなく、関連する別のカテゴリーのIDへの変換も可能です。また、直接リンクされていないDBのID間でも、他のDBを経由した変換を探索することができます。
Expand All @@ -26,9 +26,9 @@ Datasets last updated: 2023/10/10

- [API Documentation (Swagger)](https://togoid.dbcls.jp/apidoc/)

## 統計 (2023/10/10)
## 統計 (2023/10/16)
- 対象データセット数
- 96 (69のデータベースに由来)
- 99 (71のデータベースに由来)
- 対象DBの詳細やID例については、"DATASETS" タブ からご覧いただけます。

## Web user interface
Expand Down
Loading

0 comments on commit ef2caf2

Please sign in to comment.