-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Aim: To generate a brief text summary of the current knowledge of a gene based on data present in AGR. This documentation aims to record what has been done and how, for current tasks not yet completed see the AGR JIRA site: https://agr-jira.atlassian.net/browse/AGR-787
Writes a gene description based on several types of annotations, eg. GO annotations, DO annotations, etc. If more than three terms are present in the original annotation set, trims terms by finding the Lowest Common Ancestor (LCA) that most represents the terms in the annotation set.
- In annotation set, if both parents and children are present, removes parents
- If more than three terms present, finds the most representative Lowest Common Ancestor (LCA)
- Thresholds set: Distance from root: 3 for Function, 2 for Process and 5 for Component.
- Molecular function/identity (based on GO MF annoations)
- Biological processes involved in (based on GO BP annotations)
- Cellular localization (based on GO CC annotations)
- Disease associations (based on Disease Ontology (DO) term annotations)
- Tissue expression
General data source: https://s3.amazonaws.com/mod-datadumps
For a given set of GO terms, the algorithm finds the Lowest Common Ancestor (LCA) that is the most representative of the original annotation set of terms. Trims the number of terms to include only three GO terms per aspect (F, P and C) in the sentences that describe function, process and cellular localization of a gene.
Annotation terms are included according to the following evidence code priority:
- Experimental: EXP, IDA, IPI, IMP, IGI, IEP
- High-throughput experimental: HTP, HDA, HMP, HGI, HEP
- Phylogenetic and sequence based analysis: IBA, IBD, ISS, ISO, ISA, ISM, TSA
- Electronic and computational analysis: IEA, RCA
Experimental evidence codes:
- Exhibits
- A <structural></structural>
- predicted to have
- predicted to be a <structural></structural>
Experimental evidence codes: Involved in Predicted to be involved in
Experimental evidence codes: Localizes to the <component></component>
Non-experimental evidence codes: Predicted to localize to the
For GO term: intracellular Is <intracellular></intracellular> Predicted to be <intracellular></intracellular>
- 'NOT' annotations
- terms in the 'do not annotate' file at http://geneontology.org/ontology/subsets/gocheck_do_not_annotate.obo
- GO:0008150 biological_process
- GO:0003674 molecular_function
- GO:0005575 cellular component
- GO:0005488 binding
- GO:0005515 protein binding
- GO:0044877 protein-containing complex binding
- molting cycle, collagen and cuticulin-based cuticle: molting cycle
- molting cycle, chitin-based cuticle: molting cycle
- multicellular organism growth: growth
- embryo development ending in birth or egg hatching: embryo development
- synaptic transmission, <some></some>: <some></some> synaptic transmission