Skip to content
Nick Morales edited this page Oct 7, 2019 · 1 revision

To describe how germplasm, or accessions, are managed in Breedbase, it is first important to understand how organisms, or species, are managed. The Breedbase database is pre-populated with the complete NCBI taxonomy record, defining all known species with their associated genus, abbreviation, common name, and GenBank taxon identifier. Researchers using Breedbase can find their crops of interest within the 100,000+ organisms available. Germplasm, or accessions, are always created in association to the organism. A single Breedbase instance can be used for a variety of crop organisms; however, for logistical reasons, it is recommended to utilize separate instances for individual crops.

The required information to create an accession is only a unique name and the organism species name; however, germplasm can be optionally annotated with the following properties: variety, donor, donor institute, donor PUI, country of origin, state, institute code, institute name, biological status of accession code, notes, accession number, PUI, seed source, type of germplasm storage code, acquisition date, organization, location code, ploidy level, genome structure, ncbi taxonomy id, transgenic, introgression parent, introgression backcross parent, introgression map version, introgression chromosome, introgression start position in base pairs, and introgression end position in base pairs. Many of these germplasm properties are derived from the Breeding API (BrAPI) specification (BrAPI, 2019). Germplasm can be added to the database using an interactive list tool or using an Excel file upload; the Excel file upload also allows for storing and updating of all attributes listed above.

One of the most critical issues in germplasm management is creation of duplicate germplasm records for a single unique entity; often the duplication occurs because of typographical errors such as adding white space between characters or transcription errors when manually writing germplasm names. To address this issue, Breedbase does a fuzzy search across all incoming germplasm names before they can be stored in the system. The fuzzy search will detect any germplasm names existing in the system which vaguely resemble the incoming germplasm names, and then the uploader can choose to add their germplasm name as a new synonym of the existing germplasm entry in the database or they can choose to simply adopt usage of the existing germplasm unique name. Adding a synonym to the existing germplasm entry is generally the most convenient option, given that Breedbase will recognize the synonym in all downstream cases. Given this, it is still critical that all synonym names are unique and non-unique synonyms cannot be added.

Germplasm are the foundation for many of the following concepts, such as seedlots, field trials, genotyping plates, genotyping data projects, and crossing experiments. They are stored in the stock table with associated properties in the stockprop table, following an EAV model. The optional properties listed above are stored in the stockprop table using terms from the ‘stock_property’ controlled vocabulary. The stock table, as will be described in the following sections, is used to store a variety of stock-like entities, including plot and plant entries, and seedlots; in the case of germplasm, the stock table entry has a type named ‘accession’ from the ‘stock_type’ controlled vocabulary.

In Breedbase, germplasm can be grouped into populations. A population is defined with a unique name and a list of germplasm names. Populations are useful in downstream analysis for clustering and demarcating groups of germplasm. A population is stored as an entry in the stock table with a type named ‘population’ from the ‘stock_type’ controlled vocabulary.. Entries in the stock_relationship table link germplasm entries and population entries in the stock table using a type name ‘member_of’ from the ‘stock_relationship’ controlled vocabulary.

For query performance and versatility, a PostgreSQL materialized view is generated to collapse all information from the stock and stockprop table EAV model into a simple row and column table structure called materialized_stockprop. The materialized view is regenerated whenever new stock entries are added. The germplasm search, and more generally the stock search, construct complex and efficient queries using the materialized view.

Clone this wiki locally