-
Notifications
You must be signed in to change notification settings - Fork 36
Ontologies
In order to begin collecting phenotypic values for plot entries, or optionally plant and tissue sample entries, it is required that a phenotyping ontology is created in Breedbase. A phenotyping ontology defines terms that have a unique name, a definition, a unique ontology name and ontology identifier, and a unique ontology-specific term identifier. In Breedbase, an example observation variable from an ontology has the form ‘Plant Height|CO:0000002’ where the words preceding the ‘|’ are the observation variable name, the letters after the ‘|’ constitute the unique ontology identifier, and the numbers after the ‘:’ is the unique ontology-specific identifier. This observation variable name format must be used to upload phenotypes.
Ontologies can be highly disputed in terms of exact naming and definitions, and because of this, Breedbase recommends using ontologies curated by Crop Ontology (http://www.cropontology.org/). These ontologies are often structured into hierarchical categories to group observation variables into semantic clusters, such as observation variables for morphological traits versus observation variables for disease related traits. A common representation of an ontology capturing the relationships between hierarchical terms is the obo format. The obo format defines the term names, definitions, term types, ontology identifiers, and relationships between terms. Of critical importance to Breedbase is that the term type relating an observation variable to its higher level term must be ‘VARIABLE_OF’; this is how actual observation variables are separated from hierarchical categorical terms in the ontology. The obo file can be loaded into Breedbase and the observation variable terms can then be used to phenotype plots, plants, and/or tissue samples in a field trial.
The loading of obo files can only be done through the backend by a curator; this is done in order to restrict the number of terms that are added to an ontology. The reason for this is to strictly avoid fragmentation of phenotypic data under duplicated observation variables. To ease interaction between breeders and data curators a trait submission portal https://submit.rtbbase.org/ has been developed including a queuing system using Github; this trait submission system connects directly to Crop Ontology, allowing synchronization of observation variables between Breedbase and the broader ontology community.
An observation variable is ideally defined by a trait indicating the attribute being measured, such as ‘plant height’, a method indicating the process of observation, such as ‘using a ruler’, and a scale indicating the units such as ‘meters’. The ontologies available on Crop Ontology may or may not follow this structural definition, but again, the critical aspect for Breedbase is that the obo ontology indicates observation variables using the ‘VARIABLE_OF’ relationship. The formulation of an observation variable in this fashion allows for reusable name spaces for traits, methods, and scales across ontologies and enables deep phenotypic querying in Breedbase.
As an alternative to loading of strictly loading obo files via the backend, a frontend interactive interface is available on Breedbase for adding observation variables on at a time into the system. This interface must be activated in the Breedbase configuration. The interface first tells the user to select the ontology to which the new observation variable belongs, such as ‘Cassava Observation Variables’; then they define a unique name, such as ‘Plant Height using a ruler in meters’ and a definition for the new observation variable. The next step is to select the trait for which the observation variable belongs, such as ‘Plant height’, from the available trait ontologies in Breedbase; if the trait is not found in the trait ontology selected, then it can be added by defining a unique name and definition. Next, the method, such as ‘Measuring Ruler’, for the observation variable is selected; again, if the method is not found in the method ontology selected, then it can be added by defining a unique name and definition. Finally, the scale, such as ‘Meters’, for the observation variable is selected; if the scale is not found in the scale ontology selected, then it can be added by defining a unique name and definition, and optionally, the minimum, maximum, default, possible enumerated terms, and scale type.
The ontologies are usually loaded using the script gmod_load_cvterms.pl
in the https://github.com/GMOD/gmod/bin
directory.
In Breedbase it is often the case that very specific observation variables are required, such that, if the ontology were to contain all observation variables of interest, the ontology would be too large and unwieldy. The primary example of this is the case of collecting metabolic data for tissue samples collected from plant entries in the field trial; the metabolites are measured for tissue samples collected under varying environmental conditions and varying collection times. In this case, the total number of combinations between the possible terms exceeded several million observation variables. Instead of pre-defining all possible terms and saving them into Breedbase, it is possible to post-compose observation variables from defined ontologies; the result in the metabolite example was that small ontologies for metabolites, methods, and scales could be created, and then post-composed into observation variables to record phenotypic data.
To assist in categorizing ontologies, Breedbase uses entries in cvprop to annotate cv entries using type names from the ‘composable_cvtypes’ controlled vocabulary; in this way we can assign an ontology to be a trait ontology indicating it is generally used for observation variables, while another ontology can be tagged as a time ontology indicating it only has time related terms such as ‘month 1’ or ‘month 2’. Regarding time and unit ontologies which are important for most post-composing use-cases, Breedbase recommends using the SGN time and unit ontologies (https://github.com/solgenomics/sgn/tree/master/ontology).
Phenotyping ontology terms are stored in the cvterm table with links to cvterm_relationship entries to represent their place in the ontology hierarchy. When an observation variable is used to annotate a phenotypic record, for example if a plot in a field trial was measured for an observation variable called plant height, then the cvterm entry of the observation variable is linked to the phenotype table entry, as will be discussed in the next section.