Skip to content

Commit

Permalink
Merge branch 'issue-220' of https://github.com/RTXteam/RTX-KG2 into i…
Browse files Browse the repository at this point in the history
…ssue-220
  • Loading branch information
Liliana Acevedo authored and Liliana Acevedo committed Jan 12, 2023
2 parents 276e8f9 + 9fc84b6 commit a1d0af1
Show file tree
Hide file tree
Showing 38 changed files with 16,516 additions and 4,350 deletions.
23 changes: 19 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ which are in the `us-west-2` AWS region) and you will need to have an AWS
authentication key pair that is configured to be able to read from (and write
to) the bucket(s), so that the build script can download a copy of the full
Unified Medical Language System (UMLS) distribution. The full UMLS distribution
(including SNOMED CT) (`umls-2020AA-metathesaurus.zip`; IANAL, but it appears
(including SNOMED CT) (`umls-2022AA-metathesaurus.zip`; IANAL, but it appears
that the UMLS is encumbered by a license preventing redistribution so I have not
hosted them on a public server for download; but you can get it for free at the
[UMLS website](https://www.nlm.nih.gov/research/umls/) if you agree to the UMLS
Expand Down Expand Up @@ -991,10 +991,10 @@ the following keys:
present) no consitent format, unfortunately; it is usually not `null`.
- `id`: a concatenated string of other edge attributes that uniquely identifies the edge. it
follows the format `subject---relation---object---provided_by`.
- `original_predicate`: a CURIE ID for the relation as reported by the upstream
- `source_predicate`: a CURIE ID for the relation as reported by the upstream
database source.
- `provided_by`: _deprecated_. Refer to `knowledge_source`.
- `relation`: _deprecated_. See `original_predicate`.
- `relation`: _deprecated_. See `source_predicate`.

### `publications_info` slot

Expand Down Expand Up @@ -1078,14 +1078,29 @@ ValueError: unable to expand CURIE: MONARCH:cliqueLeader
would indicate that the CURIE prefix (in this case, `MONARCH`) needs to be added to the
`use_for_bidirectional_mapping` section of `curies-to-urls-map.yaml` config file.

## Error building DAG of jobs
- In the case where Snakemake is forcibly quit due to a loss of power or other reason, it may result in the code directory becoming locked. To resolve, run:
```
/home/ubuntu/kg2-venv/bin/snakemake --snakefile /home/ubuntu/kg2-code/Snakefile --unlock
```

## Authentication Error in `tsv-to-neo4j.sh`
Soemtimes, when hosting KG2 in a Neo4j server on a new AWS instance, the initial password does not get set correctly, which will lead to an Authentication Error in `tsv-to-neo4j.sh`. To fix this, do the following:
Sometimes, when hosting KG2 in a Neo4j server on a new AWS instance, the initial password does not get set correctly, which will lead to an Authentication Error in `tsv-to-neo4j.sh`. To fix this, do the following:
1. Start up Neo4 (sudo service neo4j start)
2. Wait one minute, then confirm Neo4j is running (sudo service neo4j status)
3. Use a browser to connect to Neo4j via HTTP on port 7474. You should see a username/password authentication form.
4. Fill in "neo4j" and "neo4j" for username and password, respectively, and submit the form. You should be immediately prompted to set a new password. At that time, type in our "usual" Neo4j password (you'll have to enter it twice).
5. When you submit the form, Neo4j should be running and it should now have the correct password set.

## Errors in Extraction rules

### Role exists error
Occasionally, when a database needs to be re-extracted, the error `ERROR: role "jjyang" already exists` occurs.
If the following is not in the extraction script, add it to the line above where the role is created.
```
sudo -u postgres psql -c "DROP ROLE IF EXISTS ${role}"
```

# For Developers

This section has some guidelines for the development team for the KG2 build system.
Expand Down
4 changes: 3 additions & 1 deletion Snakefile-conversion
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,10 @@ rule DrugCentral_Conversion:
validation = config['VALIDATION_PLACEHOLDER']
output:
config['DRUGCENTRAL_OUTPUT_FILE']
log:
config['BUILD_DIR'] + "/drugcentral/drugcentral-mysql-to-kg-json" + config['TEST_SUFFIX'] + ".log"
shell:
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/drugcentral_json_to_kg_json.py " + config['TEST_ARG'] + " {input.real} {output}"
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/drugcentral_json_to_kg_json.py " + config['TEST_ARG'] + " {input.real} {output} > {log} 2>&1"

rule IntAct_Conversion:
input:
Expand Down
8 changes: 6 additions & 2 deletions Snakefile-post-etl
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,10 @@ rule Stats:
config['FINAL_OUTPUT_FILE_FULL']
output:
config['REPORT_FILE_FULL']
log:
config['BUILD_DIR'] + "/report_stats_on_json_kg" + config['TEST_SUFFIX'] + ".log"
shell:
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/report_stats_on_json_kg.py {input} {output}"
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/report_stats_on_json_kg.py {input} {output} > {log} 2>&1"

rule Simplify:
input:
Expand All @@ -73,8 +75,10 @@ rule Slim:
placeholder = config['SIMPLIFIED_OUTPUT_NODES_FILE_FULL']
output:
config['SLIM_OUTPUT_FILE_FULL']
log:
config['BUILD_DIR'] + "/slim_kg2" + config['TEST_SUFFIX'] + ".log"
shell:
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/slim_kg2.py " + config['TEST_ARG'] + " {input.slim_real} {output}"
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/slim_kg2.py " + config['TEST_ARG'] + " {input.slim_real} {output} > {log} 2>&1"

rule Simplify_Stats:
input:
Expand Down
92 changes: 80 additions & 12 deletions curies-to-urls-map.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,18 @@ use_for_bidirectional_mapping:
AEO: http://purl.obolibrary.org/obo/AEO_
-
AIR: https://identifiers.org/umls/AIR/
-
alliancegenome: 'https://www.alliancegenome.org/'
-
apollo: 'https://github.com/GMOD/Apollo'
-
AraPort: http://purl.uniprot.org/araport/
-
ARG: http://purl.obolibrary.org/obo/ARG_
-
ARO: http://purl.obolibrary.org/obo/ARO_
-
AspGD: 'http://www.aspergillusgenome.org/cgi-bin/locus.pl?dbid='
-
ATC: http://purl.bioontology.org/ontology/ATC/
-
Expand All @@ -21,18 +27,22 @@ use_for_bidirectional_mapping:
BFO: http://purl.obolibrary.org/obo/BFO_
-
bibo: http://purl.org/ontology/bibo/
# -
# biolink: https://w3id.org/biolink/vocab/
-
biolink: https://w3id.org/linkml/
biolink: https://w3id.org/biolink/vocab/
# -
# biolink: https://w3id.org/linkml/
-
biolink_download_source: https://raw.githubusercontent.com/biolink/biolink-model/master/
-
bioschemas: 'https://bioschemas.org/'
-
BRO: http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#
-
BSPO: http://purl.obolibrary.org/obo/BSPO_
-
BTO: http://purl.obolibrary.org/obo/BTO_
-
CAID: 'http://reg.clinicalgenome.org/redmine/projects/registry/genboree_registry/by_caid?caid='
-
CARO: http://purl.obolibrary.org/obo/CARO_
-
Expand All @@ -49,22 +59,40 @@ use_for_bidirectional_mapping:
CHEMBL.TARGET: "https://identifiers.org/chembl.target:"
-
CHMO: http://purl.obolibrary.org/obo/CHMO_
-
CID: 'http://pubchem.ncbi.nlm.nih.gov/compound/'
-
CL: http://purl.obolibrary.org/obo/CL_
-
clinicaltrials: "https://identifiers.org/clinicaltrials:"
-
CLINVAR: 'http://identifiers.org/clinvar'
-
CLO: http://purl.obolibrary.org/obo/CLO_
-
COAR_RESOURCE: 'http://purl.org/coar/resource_type/'
-
COG: 'https://www.ncbi.nlm.nih.gov/research/cog-project/'
-
CP: http://purl.obolibrary.org/obo/CP_
-
CPT: http://purl.bioontology.org/ontology/CPT/
CPT: https://www.ama-assn.org/practice-management/cpt/
-
CTD.CHEMICAL: 'http://ctdbase.org/detail.go?type=chem&acc='
-
CTD.DISEASE: 'http://ctdbase.org/detail.go?type=disease&db=MESH&acc='
-
CTD.GENE: 'http://ctdbase.org/detail.go?type=gene&acc='
-
CTD: 'http://ctdbase.org/'
-
CVDO: http://purl.obolibrary.org/obo/CVDO_
-
dbpedia: http://dbpedia.org/resource/
-
dc: http://purl.org/dc/elements/1.1/
-
dcat: 'http://www.w3.org/ns/dcat#'
-
dcid: 'https://datacommons.org/browser/'
-
Expand All @@ -74,7 +102,7 @@ use_for_bidirectional_mapping:
-
DDANAT: http://purl.obolibrary.org/obo/DDANAT_
-
DGIdb: http://www.dgidb.org/
DGIdb: https://www.dgidb.org/
-
dictybase.gene: "https://identifiers.org/dictybase.gene:"
-
Expand All @@ -101,6 +129,14 @@ use_for_bidirectional_mapping:
ECTO: http://purl.obolibrary.org/obo/ECTO_
-
EDAM: http://purl.bioontology.org/ontology/edam/
-
EDAM-DATA: 'http://edamontology.org/data_'
-
EDAM-FORMAT: 'http://edamontology.org/format_'
-
EDAM-OPERATION: 'http://edamontology.org/operation_'
-
EDAM-TOPIC: 'http://edamontology.org/topic_'
-
EFO: http://purl.bioontology.org/ontology/EFO/
-
Expand All @@ -121,6 +157,8 @@ use_for_bidirectional_mapping:
ExO: http://purl.obolibrary.org/obo/ExO_
-
FAO: http://purl.obolibrary.org/obo/FAO_
-
fabio: 'http://purl.org/spar/fabio/'
-
FBbt: http://purl.obolibrary.org/obo/FBbt_
-
Expand All @@ -137,8 +175,18 @@ use_for_bidirectional_mapping:
FMA: http://purl.obolibrary.org/obo/FMA_
-
foaf: http://xmlns.com/foaf/0.1/
-
foodb.compound: 'http://foodb.ca/foods/'
-
foodb.food: 'http://foodb.ca/compounds/'
-
FOODON: http://purl.obolibrary.org/obo/FOODON_
-
FYECO: 'https://www.pombase.org/term/'
-
FYPO: 'http://purl.obolibrary.org/obo/FYPO_' # Fission Yeast Phenotype Ontology
-
GAMMA: 'http://translator.renci.org/GAMMA_'
-
GARD: http://purl.obolibrary.org/obo/GARD_
-
Expand All @@ -151,12 +199,16 @@ use_for_bidirectional_mapping:
GENO: http://purl.obolibrary.org/obo/GENO_
-
GEO: http://purl.obolibrary.org/obo/GEO_
-
gff3: 'https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md#'
-
GO: http://purl.obolibrary.org/obo/GO_
-
GOREL: http://purl.obolibrary.org/obo/GOREL_
-
GO_REF: "https://identifiers.org/GO_REF:"
-
gpi: 'https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md#'
-
GTPI: https://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=
-
Expand Down Expand Up @@ -206,7 +258,9 @@ use_for_bidirectional_mapping:
-
KEGG_source: "https://www.genome.jp"
-
LOINC: http://purl.bioontology.org/ontology/LNC/
linkml: 'https://w3id.org/linkml/'
-
LOINC: http://loinc.org/rdf/
-
MA: http://purl.obolibrary.org/obo/MA_
-
Expand All @@ -231,8 +285,12 @@ use_for_bidirectional_mapping:
miRBase: http://identifiers.org/mirbase
-
miRGate: http://mirgate.bioinfo.cnio.es
-
mmmp.biomaps: 'https://bioregistry.io/mmmp.biomaps:'
-
MMO: http://purl.obolibrary.org/obo/MMO_
-
MmusDv: 'http://purl.obolibrary.org/obo/MMUSDV_'
-
MOD: http://purl.obolibrary.org/obo/MOD_
-
Expand All @@ -255,8 +313,10 @@ use_for_bidirectional_mapping:
NBO-PROPERTY: 'http://purl.obolibrary.org/obo/nbo#'
-
NCBIGene: 'http://identifiers.org/ncbigene/'
-
NCBITaxon: 'http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl'
# -
# NCBITaxon: 'http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl'
-
NCBITaxon: http://purl.obolibrary.org/obo/NCBITaxon_
-
NCIT: "https://identifiers.org/ncit:"
-
Expand Down Expand Up @@ -295,6 +355,8 @@ use_for_bidirectional_mapping:
OMIABIS: http://purl.obolibrary.org/obo/OMIABIS_
-
OMIM: http://purl.obolibrary.org/obo/OMIM_
-
OMIM.PS: 'https://www.omim.org/phenotypicSeries/'
-
OMIMDiseaseCluster: http://purl.obolibrary.org/obo/DC_
-
Expand Down Expand Up @@ -395,8 +457,6 @@ use_for_bidirectional_mapping:
SWO: http://www.ebi.ac.uk/swo/
-
SYMP: http://purl.obolibrary.org/obo/SYMP_
-
NCBITaxon: http://purl.obolibrary.org/obo/NCBITaxon_
-
TO: http://purl.obolibrary.org/obo/TO_
-
Expand Down Expand Up @@ -441,6 +501,10 @@ use_for_bidirectional_mapping:
VT: http://purl.obolibrary.org/obo/VT_
-
wb: "https://identifiers.org/wb:"
-
WBls: 'http://purl.obolibrary.org/obo/WBBL_'
-
WBbt: 'http://purl.obolibrary.org/obo/WBBT_'
-
WIKIDATA: 'https://www.wikidata.org/wiki/'
-
Expand Down Expand Up @@ -483,6 +547,8 @@ use_for_contraction_only:
CPT: http://purl.bioontology.org/ontology/HCPT/
-
DDANAT: http://purl.obolibrary.org/obo/ddanat#
-
DGIdb: https://www.dgidb.org/interaction_types/
-
DRUGBANK: http://purl.bioontology.org/ontology/DRUGBANK/
-
Expand Down Expand Up @@ -541,6 +607,8 @@ use_for_contraction_only:
ICD9: http://purl.obolibrary.org/obo/ICD9_
-
KEGG: http://purl.obolibrary.org/obo/KEGG_
-
LOINC: http://purl.bioontology.org/ontology/LNC/
-
MEDDRA: http://purl.obolibrary.org/obo/MedDRA_
-
Expand All @@ -567,6 +635,8 @@ use_for_contraction_only:
NCBIGene: http://www.ncbi.nlm.nih.gov/gene/
-
NCBITaxon: http://purl.bioontology.org/ontology/NCBITAXON/
-
NCBITaxon: http://purl.obolibrary.org/obo/ncbitaxon#
-
NCIT: http://purl.bioontology.org/ontology/NCI/
-
Expand Down Expand Up @@ -623,8 +693,6 @@ use_for_contraction_only:
SNOMED: http://identifiers.org/snomedct/
-
SO: http://purl.obolibrary.org/obo/so#
-
NCBITaxon: http://purl.obolibrary.org/obo/ncbitaxon#
-
UBERON: http://purl.obolibrary.org/obo/uberon/insect-anatomy#
-
Expand Down
Loading

0 comments on commit a1d0af1

Please sign in to comment.