Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template for extraction of knowledge from literature to go into RNA KG #125

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/ontogpt/evaluation/rna_kg/abstract1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Abstract
One challenge in miRNA–genes–diseases interaction studies is that it is challenging to
find labeled data that indicate a positive or negative relationship between miRNA and
genes. The use of one-class classification methods shows a promising path for validating
them. We have applied two one-class classification methods, Isolation Forest and
One-class SVM, to validate miRNAs interactions with the ERBB2 gene present in breast
cancer scenarios using features extracted via sequence-binding. We found that the
One-class SVM outperforms the Isolation Forest model, with values of sensitivity of
80.49% and a specificity of 86.49% showing results that are comparable to previous studies.
Additionally, we have demonstrated that the use of features extracted from a
sequence-based approach (considering miRNA and gene sequence binding characteristics)
and one-class models have proven to be a feasible method for validating these genetic
molecule interactions.
16 changes: 16 additions & 0 deletions src/ontogpt/evaluation/rna_kg/abstract2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
MicroRNA (miRNA)–gene interactions are well-recognized as involved in the progression
of almost all cancer types including prostate cancer, which is one of the most common
cancers in men. This study explored the significantly dysregulated genes and miRNAs and
elucidated the potential miRNA–gene regulatory network in prostate cancer. Integrative
analysis of prostate cancer and normal prostate transcriptomic data in The Cancer Genome
Atlas dataset was conducted using both differential expression analysis and weighted
correlation network analysis (WGCNA). Thirteen genes (RRM2, ORC6, CDC45, CDKN2A, E2F2,
MYBL2, CCNB2, PLK1, FOXM1, CDC25C, PKMYT1, GTSE1, and CDC20) were potentially
correlated with prostate cancer based on functional enrichment analyses. MiRNAs
targeting these genes were predicted and eight miRNAs were intersections between
those miRNAs and the hub miRNAs obtained from miRNA WGCNA analysis. Three genes
(E2F2, RRM2, and PKMYT1) and four miRNAs (hsa-mir-17-5p, hsa-mir-20a-5p, hsa-mir-92a-3p,
and hsa-mir-93-5p) were key factors according to the interaction network. RRM2 and
PKMYT1 were significantly related to survival. These findings partially elucidated
the dysregulation of gene expressions in prostate cancer. Efficient manipulations of
the miRNA–gene interactions in prostate cancer may be exploited as promising therapeutics.
243 changes: 243 additions & 0 deletions src/ontogpt/templates/composite_disease.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
from __future__ import annotations
from datetime import datetime, date
from enum import Enum
from typing import List, Dict, Optional, Any, Union, Literal
from pydantic import BaseModel as BaseModel, Field
from linkml_runtime.linkml_model import Decimal

metamodel_version = "None"
version = "None"

class WeakRefShimBaseModel(BaseModel):
__slots__ = '__weakref__'

class ConfiguredBaseModel(WeakRefShimBaseModel,
validate_assignment = True,
validate_all = True,
underscore_attrs_are_private = True,
extra = 'forbid',
arbitrary_types_allowed = True):
pass


class NCITDrugType(str, Enum):


dummy = "dummy"


class NCITTreatmentType(str, Enum):


dummy = "dummy"


class NCITTActivityType(str, Enum):


dummy = "dummy"


class MAXOActionType(str, Enum):


dummy = "dummy"


class MESHTherapeuticType(str, Enum):


dummy = "dummy"


class CHEBIDrugType(str, Enum):


dummy = "dummy"


class NullDataOptions(str, Enum):

UNSPECIFIED_METHOD_OF_ADMINISTRATION = "UNSPECIFIED_METHOD_OF_ADMINISTRATION"
NOT_APPLICABLE = "NOT_APPLICABLE"
NOT_MENTIONED = "NOT_MENTIONED"



class CompositeDisease(ConfiguredBaseModel):

main_disease: Optional[str] = Field(None, description="""the name of the disease that is treated.""")
drugs: Optional[List[str]] = Field(default_factory=list, description="""semicolon-separated list of named small molecule drugs""")
treatments: Optional[List[str]] = Field(default_factory=list, description="""semicolon-separated list of therapies and treatments are indicated for treating the disease.""")
contraindications: Optional[List[str]] = Field(default_factory=list, description="""semicolon-separated list of therapies and treatments that are contra-indicated for the disease, and should not be used, due to risk of adverse effects.""")
treatment_mechanisms: Optional[List[TreatmentMechanism]] = Field(default_factory=list, description="""semicolon-separated list of treatment to asterisk-separated mechanism associations""")
treatment_efficacies: Optional[List[TreatmentEfficacy]] = Field(default_factory=list, description="""semicolon-separated list of treatment to efficacy associations, e.g. Imatinib*effective""")
treatment_adverse_effects: Optional[List[TreatmentAdverseEffect]] = Field(default_factory=list, description="""semicolon-separated list of treatment to adverse effect associations, e.g. Imatinib*nausea""")



class ExtractionResult(ConfiguredBaseModel):
"""
A result of extracting knowledge on text
"""
input_id: Optional[str] = Field(None)
input_title: Optional[str] = Field(None)
input_text: Optional[str] = Field(None)
raw_completion_output: Optional[str] = Field(None)
prompt: Optional[str] = Field(None)
extracted_object: Optional[Any] = Field(None, description="""The complex objects extracted from the text""")
named_entities: Optional[List[Any]] = Field(default_factory=list, description="""Named entities extracted from the text""")



class NamedEntity(ConfiguredBaseModel):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Gene(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Symptom(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Disease(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class AdverseEffect(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Treatment(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Mechanism(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Drug(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class CompoundExpression(ConfiguredBaseModel):

None



class TreatmentMechanism(CompoundExpression):

treatment: Optional[str] = Field(None)
mechanism: Optional[str] = Field(None)



class TreatmentAdverseEffect(CompoundExpression):

treatment: Optional[str] = Field(None)
adverse_effects: Optional[List[str]] = Field(default_factory=list)



class TreatmentEfficacy(CompoundExpression):

treatment: Optional[str] = Field(None)
efficacy: Optional[str] = Field(None)



class Triple(CompoundExpression):
"""
Abstract parent for Relation Extraction tasks
"""
subject: Optional[str] = Field(None)
predicate: Optional[str] = Field(None)
object: Optional[str] = Field(None)
qualifier: Optional[str] = Field(None, description="""A qualifier for the statements, e.g. \"NOT\" for negation""")
subject_qualifier: Optional[str] = Field(None, description="""An optional qualifier or modifier for the subject of the statement, e.g. \"high dose\" or \"intravenously administered\"""")
object_qualifier: Optional[str] = Field(None, description="""An optional qualifier or modifier for the object of the statement, e.g. \"severe\" or \"with additional complications\"""")



class TextWithTriples(ConfiguredBaseModel):

publication: Optional[Publication] = Field(None)
triples: Optional[List[Triple]] = Field(default_factory=list)



class RelationshipType(NamedEntity):

id: Optional[str] = Field(None, description="""A unique identifier for the named entity""")
label: Optional[str] = Field(None, description="""The label (name) of the named thing""")



class Publication(ConfiguredBaseModel):

id: Optional[str] = Field(None, description="""The publication identifier""")
title: Optional[str] = Field(None, description="""The title of the publication""")
abstract: Optional[str] = Field(None, description="""The abstract of the publication""")
combined_text: Optional[str] = Field(None)
full_text: Optional[str] = Field(None, description="""The full text of the publication""")



class AnnotatorResult(ConfiguredBaseModel):

subject_text: Optional[str] = Field(None)
object_id: Optional[str] = Field(None)
object_text: Optional[str] = Field(None)




# Update forward refs
# see https://pydantic-docs.helpmanual.io/usage/postponed_annotations/
CompositeDisease.update_forward_refs()
ExtractionResult.update_forward_refs()
NamedEntity.update_forward_refs()
Gene.update_forward_refs()
Symptom.update_forward_refs()
Disease.update_forward_refs()
AdverseEffect.update_forward_refs()
Treatment.update_forward_refs()
Mechanism.update_forward_refs()
Drug.update_forward_refs()
CompoundExpression.update_forward_refs()
TreatmentMechanism.update_forward_refs()
TreatmentAdverseEffect.update_forward_refs()
TreatmentEfficacy.update_forward_refs()
Triple.update_forward_refs()
TextWithTriples.update_forward_refs()
RelationshipType.update_forward_refs()
Publication.update_forward_refs()
AnnotatorResult.update_forward_refs()

Loading