Skip to content

Commit

Permalink
Merge pull request #493 from monarch-initiative/develop
Browse files Browse the repository at this point in the history
`develop` -> `main`
  • Loading branch information
joeflack4 authored Apr 16, 2024
2 parents 89772cf + ffb6008 commit a3cc936
Show file tree
Hide file tree
Showing 37 changed files with 413 additions and 93 deletions.
31 changes: 25 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,37 @@
This repo is dedicated to the integration of various clinical terminologies and ontologies into Mondo. For more details
see the [documentation](https://monarch-initiative.github.io/mondo-ingest/).

Work on the Mondo Source Ingest is funded by the NHGRI Phenomics First Grant 1RM1HG010860-01.
Work on the Mondo Source Ingest is funded by the _NHGRI Phenomics First Grant 1RM1HG010860-01_.

## Workflows
A variety of workflows are available to run the ingest. See the [workflows documentation](./docs/developer/workflows.md) for more details.
## Prerequisites
Python is a dev dependency. It's not needed to run the docker containers, but needed for local development situations
/ debugging.
1. Python 3.9+
2. Docker
3. Docker images
One or both of the following, depending on if you want to run the stable build `latest` or `dev`:
- a. `docker pull obolibrary/odkfull:latest`
- b. `docker pull obolibrary/odkfull:dev`

## Running
### Full build
`sh run.sh make build-mondo-ingest`

### [Workflows](./docs/developer/workflows.md)

## Reports
A variety of reports are committed as static files in `src/ontology/reports/`, but some additional reports get rendered
into markdown pages as noted below.

### Mapping progress report
The [mapping progress report](./docs/reports/unmapped.md) consists lists of all umapped terms fo each ontology, as well
The [mapping progress report](./docs/reports/unmapped.md) consists lists of all unmapped terms fo each ontology, as well
as a table of statistics showing total number of terms, excluded terms, deprecated terms, and unmapped terms.

### Mapped deprecated terms
The [_mapped deprecated terms_ page](./docs/reports/mapped_deprecated.md) contains a table of statistics showing total number of deprecated terms that have existing xrefs in Mondo, for each ontology. There is also a link to a page for each ontology which shows the term IDs and their corresponding mapped Mondo ID(s).
The [_mapped deprecated terms_ page](./docs/reports/mapped_deprecated.md) contains a table of statistics showing total number of deprecated terms that
have existing xrefs in Mondo, for each ontology. There is also a link to a page for each ontology which shows the term
IDs and their corresponding mapped Mondo ID(s).

### Migratable terms
The [_migrate_ page](./docs/reports/migrate.md) contains a table of statistics showing of terms ready for migration / integration into Mondo.
The [_migrate_ page](./docs/reports/migrate.md) contains a table of statistics showing of terms ready for migration /
integration into Mondo.
11 changes: 10 additions & 1 deletion docs/developer/add-new-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,25 @@ Add a new metadata file to [src/ontology/metadata](https://github.com/monarch-in
Prefixes need to be entered in the following places in the yml:
- `curie_map`
- `extended_prefix_map`
- `subject_prefixes`

### 2.3. `config/prefixes.csv`
Add prefixes.

### 2.4. `config/context.json`
Add prefixes.

### 2.5. `lexmatch-sssom-compare.py`
There is a section of branching logic with a comment "Map ontology filenames to prefixes". Add an entry there if either
(a) there is 1 prefix you care about, and it is spelled differently than the component filename (e.g. the prefix is
`myontology`, but the filename is `components/my-ontology.owl`), or (b) there is more than 1 prefix.

## 3. Docs
### 3.1. `mkdocs.yaml`
Update the Website Table of Contents in [mkdocs.yaml](https://github.com/monarch-initiative/mondo-ingest/blob/main/mkdocs.yaml)

### 3.2. `docs/sources/*.md`
Run `sh run.sh make ../../docs/sources/*.md` from `src/ontology`. Then edit it manually to add any more informatoin.
Run `sh run.sh make ../../docs/sources/*.md` from `src/ontology`. Then edit it manually to add any more information.

### 3.3. `docs/sources.md`
Add a link to your new `.md` file created in the last step.
Expand Down
1 change: 1 addition & 0 deletions docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ You can find information about the source modules ingested below. Remember that
- [GARD](metrics/gard.md)
- [ICD10CM](metrics/icd10cm.md)
- [ICD10WHO](metrics/icd10who.md)
- [ICD11Foundation](metrics/icd11foundation.md)
- [NCIT](metrics/ncit.md)
- [OMIM](metrics/omim.md)
- [ORDO](metrics/ordo.md)
Expand Down
66 changes: 66 additions & 0 deletions docs/metrics/icd11foundation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Metrics HTTP://PURL.OBOLIBRARY.ORG/OBO/MONDO-INGEST/COMPONENTS/ICD11FOUNDATION

**IRI:** http://purl.obolibrary.org/obo/mondo-ingest/components/icd11foundation.owl

**Version IRI:** http://purl.obolibrary.org/obo/mondo-ingest/releases/2024-02-17/components/icd11foundation.owl

### Entities and axioms

| Metric | Value |
| ------ | ----- |
| Annotation properties | 21 |
| Axioms | 570662 |
| Logical axioms | 130473 |
| Classes | 100002 |
| Object properties | 70 |
| Data properties | 0 |
| Individuals | 0 |


### Expressivity

| Metric | Value |
| ------ | ----- |
| Expressivity | CINTEH |
| OWL2 | True |
| OWL2 DL | True |
| OWL2 EL | True |
| OWL2 QL | False |
| OWL2 RL | False |

#### Axiom types

| Metric | Value |
| ------ | ----- |
| AnnotationAssertion | 340100 |
| EquivalentClasses | 5075 |
| SubObjectPropertyOf | 51 |
| Declaration | 100089 |
| SubClassOf | 125347 |


#### Entity namespaces: axiom counts by namespace

| Metric | Value |
| ------ | ----- |
| prefix_unknown | 100084 |
| owl | 3 |
| rdf | 1 |
| xsd | 1 |
| skos | 5 |
| rdfs | 1 |


#### Class expressions used

| Metric | Value |
| ------ | ----- |
| Class | 392111 |
| ObjectSomeValuesFrom | 40919 |
| ObjectIntersectionOf | 19706 |


More information about the source can be found [in the documentation](../sources.md). The raw data (ontology metrics) can be found [on GitHub](https://github.com/monarch-initiative/mondo-ingest/tree/main/src/ontology/metadata).

You can make issues or ask questions about this source [here](https://github.com/monarch-initiative/mondo-ingest/issues).

1 change: 1 addition & 0 deletions docs/odk-workflows/RepositoryFileStructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ These are the components in MONDO-INGEST
| gard.owl | https://github.com/monarch-initiative/gard/releases/latest/download/gard.owl |
| icd10cm.owl | https://data.bioontology.org/ontologies/ICD10CM/submissions/23/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb |
| icd10who.owl | https://github.com/monarch-initiative/icd10who/releases/latest/download/icd10who.ttl |
| icd11foundation.owl | https://github.com/monarch-initiative/icd11/releases/latest/download/icd11foundation.owl |
| ncit.owl | http://purl.obolibrary.org/obo/ncit.owl |
| omim.owl | https://github.com/monarch-initiative/omim/releases/latest/download/omim.owl |
| ordo.owl | http://www.orphadata.org/data/ORDO/ordo_orphanet.owl |
1 change: 1 addition & 0 deletions docs/sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [GARD](sources/gard.md)
- [ICD10CM](sources/icd10cm.md)
- [ICD10WHO](sources/icd10who.md)
- [ICD11Foundation](sources/icd11foundation.md)
- [NCIT](sources/ncit.md)
- [OMIM](sources/omim.md)
- [ORDO](sources/ordo.md)
31 changes: 31 additions & 0 deletions docs/sources/icd11foundation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# MONDO - ICD11FOUNDATION Alignment

**Source name:** International Classification of Diseases 11th Revision

**Source description:** The International Classification of Diseases (ICD) provides a common language that allows health
professionals to share standardized information across the world. The eleventh revision contains around 17 000 unique
codes, more than 120 000 codable terms and is now entirely digital.Feb 11, 2022
This data source in particular is the ICD11 foundation, not one of its linearizations.

**Homepage:** https://icd.who.int/

**Comments about this source:**
_Data source_
_Original source URL_: https://icd11files.blob.core.windows.net/tmp/whofic-2023-04-08.owl.gz

_Preprocessing_
In the [monarch-initiative/icd11](https://github.com/monarch-initiative/icd11) repo, We remove unicode characters and
then remove equivalent class statements as discussed below.

_Equivalent classes_
We remove all equivalent class statements as they are not unique and result in unintended node merges. For example
`icd11.foundation:2000662282` (_Occupant of pick-up truck or van injured in collision with car, pick-up truck or van:
person on outside of vehicle injured in traffic accident_) has the same exact equivalent concept expression as
`icd11.foundation:1279712844` (_Occupant of pick-up truck or van injured in collision with two- or three- wheeled motor
vehicle: person on outside of vehicle injured in traffic accident_).

---

The data pipeline that generates the source is implemented in `make`, in this source file: [src/ontology/mondo-ingest.Makefile](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile).

You can make issues or ask questions about this source [here](https://github.com/monarch-initiative/mondo-ingest/issues).
2 changes: 1 addition & 1 deletion docs/sources/ordo.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
* **EntityRemoval**: Removing information that are on obsolete Mondo terms (MONDO:ObsoleteEquivalent).
* **Update**: Updating the source with various SPARQL preprocessing steps
* [MONDO_INGEST_QUERY:fix_deprecated.ru](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/sparql/fix_deprecated.ru)
* [MONDO_INGEST_QUERY:fix_complex_reification.ru](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/sparql/fix_complex_reification.ru)
* [MONDO_INGEST_QUERY:fix_complex_reification_ordo.ru](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/sparql/fix_complex_reification_ordo.ru)
* [MONDO_INGEST_QUERY:fix_xref_prefixes.ru](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/sparql/fix_xref_prefixes.ru)
* [MONDO_INGEST_QUERY:ordo-construct-subclass-from-part-of.ru](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/sparql/ordo-construct-subclass-from-part-of.ru)
* [MONDO_INGEST_QUERY:ordo-construct-subsets.ru](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/sparql/ordo-construct-subsets.ru)
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ nav:
- GARD: sources/gard.md
- ICD10CM: sources/icd10cm.md
- ICD10WHO: sources/icd10who.md
- ICD11Foundation: sources/icd11foundation.md
- NCIT: sources/ncit.md
- OMIM: sources/omim.md
- ORDO: sources/ordo.md
Expand All @@ -40,6 +41,7 @@ nav:
- GARD: metrics/gard.md
- ICD10CM: metrics/icd10cm.md
- ICD10WHO: metrics/icd10who.md
- ICD11Foundation: metrics/icd11foundation.md
- NCIT: metrics/ncit.md
- OMIM: metrics/omim.md
- ORDO: metrics/ordo.md
18 changes: 16 additions & 2 deletions src/ontology/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# More information: https://github.com/INCATools/ontology-development-kit/

# Fingerprint of the configuration file when this Makefile was last generated
CONFIG_HASH= 7e46e2aae3d97f90d3901bf0f67d6b8673defa3e391aab5f4d26c3412861e875
CONFIG_HASH= 1f779a242dc046d5c98b39bde1a60c2488195d1ad6e0a21ee94bac1ae0f05b98


# ----------------------------------------
Expand Down Expand Up @@ -54,7 +54,7 @@ OBODATE ?= $(shell date +'%d:%m:%Y %H:%M')
VERSION= $(TODAY)
ANNOTATE_ONTOLOGY_VERSION = annotate -V $(ONTBASE)/releases/$(VERSION)/$@ --annotation owl:versionInfo $(VERSION)
ANNOTATE_CONVERT_FILE = annotate --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) convert -f ofn --output $@.tmp.owl && mv $@.tmp.owl $@
OTHER_SRC = $(COMPONENTSDIR)/doid.owl $(COMPONENTSDIR)/gard.owl $(COMPONENTSDIR)/icd10cm.owl $(COMPONENTSDIR)/icd10who.owl $(COMPONENTSDIR)/ncit.owl $(COMPONENTSDIR)/omim.owl $(COMPONENTSDIR)/ordo.owl
OTHER_SRC = $(COMPONENTSDIR)/doid.owl $(COMPONENTSDIR)/gard.owl $(COMPONENTSDIR)/icd10cm.owl $(COMPONENTSDIR)/icd10who.owl $(COMPONENTSDIR)/icd11foundation.owl $(COMPONENTSDIR)/ncit.owl $(COMPONENTSDIR)/omim.owl $(COMPONENTSDIR)/ordo.owl
ONTOLOGYTERMS = $(TMPDIR)/ontologyterms.txt
EDIT_PREPROCESSED = $(TMPDIR)/$(ONT)-preprocess.owl

Expand Down Expand Up @@ -485,6 +485,20 @@ $(COMPONENTSDIR)/icd10who.owl: component-download-icd10who.owl
.PRECIOUS: $(COMPONENTSDIR)/icd10who.owl


.PHONY: component-download-icd11foundation.owl
component-download-icd11foundation.owl: | $(TMPDIR)
if [ $(MIR) = true ] && [ $(COMP) = true ]; then $(ROBOT) merge -I https://github.com/monarch-initiative/icd11/releases/latest/download/icd11foundation.owl \
annotate --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o $(TMPDIR)/$@.owl; fi

$(COMPONENTSDIR)/icd11foundation.owl: component-download-icd11foundation.owl
if [ $(COMP) = true ]; then if cmp -s $(TMPDIR)/component-download-icd11foundation.owl.owl $(TMPDIR)/component-download-icd11foundation.owl.tmp.owl ; then echo "Component identical."; \
else echo "Component is different, updating." &&\
cp $(TMPDIR)/component-download-icd11foundation.owl.owl $(TMPDIR)/component-download-icd11foundation.owl.tmp.owl &&\
$(ROBOT) annotate -i $(TMPDIR)/component-download-icd11foundation.owl.owl --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o $@; fi; fi

.PRECIOUS: $(COMPONENTSDIR)/icd11foundation.owl


.PHONY: component-download-ncit.owl
component-download-ncit.owl: | $(TMPDIR)
if [ $(MIR) = true ] && [ $(COMP) = true ]; then $(ROBOT) merge -I http://purl.obolibrary.org/obo/ncit.owl \
Expand Down
4 changes: 3 additions & 1 deletion src/ontology/config/context.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@
"NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_",
"ICD10CM": "http://purl.bioontology.org/ontology/ICD10CM/",
"ICD10WHO": "http://apps.who.int/classifications/icd10/browse/2010/en#/",
"OMIMPS": "https://www.omim.org/phenotypicSeries/PS",
"icd11.foundation": "http://id.who.int/icd/entity/",
"icd11.z": "http://who.int/icd#Z_",
"OMIMPS": "https://omim.org/phenotypicSeries/PS",
"MONDOREL": "http://purl.obolibrary.org/obo/mondo#"
}
}
5 changes: 5 additions & 0 deletions src/ontology/config/icd11foundation-property-map.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
subject_id object_id
http://id.who.int/icd/schema/isObsolote owl:deprecated
http://id.who.int/icd/schema/longDefinition http://purl.org/dc/terms/description
http://id.who.int/icd/schema/note rdfs:comment
skos:definition IAO:0000115
1 change: 1 addition & 0 deletions src/ontology/config/icd11foundation_exclusions.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
term_id term_label exclusion_reason exclude_children
6 changes: 5 additions & 1 deletion src/ontology/config/prefixes.csv
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,10 @@ ICD10CM,http://purl.bioontology.org/ontology/ICD10CM/
ICD10CM2,https://icd.codes/icd10cm/
ICD10WHO,https://icd.who.int/browse10/2019/en#/
ICD10WHO2010,http://apps.who.int/classifications/icd10/browse/2010/en#/
ICD11,http://purl.obolibrary.org/obo/mondo/mappings/unknown_prefix/ICD11/
icd11.foundation,http://id.who.int/icd/entity/
icd11.schema,http://id.who.int/icd/schema/
icd11.z,http://who.int/icd#Z_
OMIMPS,https://omim.org/phenotypicSeries/PS
OMIM,https://omim.org/entry/
Orphanet,http://www.orpha.net/ORDO/Orphanet_
Expand All @@ -246,7 +250,7 @@ semapv,https://w3id.org/semapv/vocab/
HGNC_SYMBOL,https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/
HGNC,https://identifiers.org/hgnc/
ncbi.gene,https://www.ncbi.nlm.nih.gov/gene/
OMIMPS,https://www.omim.org/phenotypicSeries/PS
OMIMPS,https://omim.org/phenotypicSeries/PS
STY,http://purl.bioontology.org/ontology/STY/
sssom,https://w3id.org/sssom/
biolink,https://w3id.org/biolink/vocab/
Expand Down
2 changes: 2 additions & 0 deletions src/ontology/config/properties.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ http://www.w3.org/2004/02/skos/core#narrowMatch
http://www.w3.org/2004/02/skos/core#relatedMatch
http://www.w3.org/2004/02/skos/core#exactMatch
http://www.w3.org/2004/02/skos/core#closeMatch
rdfs:comment
rdfs:label
rdfs:seeAlso
owl:deprecated
http://purl.org/dc/terms/description
2 changes: 1 addition & 1 deletion src/ontology/metadata/doid.metadata.sssom.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ curie_map:
# MedDRA: https://identifiers.org/meddra/
MESH: https://meshb.nlm.nih.gov/record/ui?ui=
OMIM: https://omim.org/entry/
OMIMPS: https://www.omim.org/phenotypicSeries/PS
OMIMPS: https://omim.org/phenotypicSeries/PS
# Orphanet: http://www.orpha.net/ORDO/Orphanet_
UMLS: http://linkedlifedata.com/resource/umls/id/
DOID: http://purl.obolibrary.org/obo/DOID_
Expand Down
Loading

0 comments on commit a3cc936

Please sign in to comment.