Skip to content

Latest commit

 

History

History
50 lines (29 loc) · 9.51 KB

03.07.data-portals.md

File metadata and controls

50 lines (29 loc) · 9.51 KB

Data Portal

FAIDARE

FAIDARE (https://urgi.versailles.inrae.fr/faidare/) is a data discovery portal providing a biologist friendly search system over a global federation of 33 plant research databases. It allows to identify data resources using a full text approach completed with domain specific filters and to link back to the original database for visualization, analysis and download. For instance, it is possible to search for "wheat drought" then to refine the search to the "Triticum aestivum" taxon and yield component traits such as "Thousand Grain Weight". The indexed data types are very broad and include genomic features, such as genes or transposable elements, selected bibliography, QTL, markers, genetic variation studies, phenomic studies and plant genetic resources ie germplasm. This inclusiveness is achieved thanks to a two stage indexation data model. The most generic one provides basic search functionalities and relies on five fields : name, link back URL, data type, species and exhaustive description. The filtering is directly tied to some of those fields. Therefore, to provide more advanced filtering, FAIDARE is also providing a second stage indexation mechanism by taking advantage of BrAPi endpoints to get more detailed metadata on genotyping and phenotyping studies as well as germplasm. In parallel, FAIDARE provides a pre-visualization of germplasm and studies using dedicated cards. The indexation mechanism relies on a dedicated public software (https://github.com/elixir-europe/plant-brapi-etl-faidare) that allows data resources manager to request the indexation of there database using pull requests. It is able to extract data from any BrAPI 1.3 and 1.2 endpoint and development of BrAPI 2.x indexation will be initiated in 2025. Since not all databases are willing to implement BrAPI endpoints, we also provide the possibility to generate metadata as BrAPI json files, hence using the standard as a file exchange format. FAIDARE has been adopted by several communities and in particular in the ELIXIR and EMPHASIS european infrastructures. It is also used by the WheatIS of the Wheat-Initiative. Several databases are added each year to the FAIDARE global federation, allowing to increase both the portal and the BrAPI adoption.

Phenospex - HortControl

HortControl, developed by Phenospex, is a data repository. HortControl has a BrAPI implementation to be used to automate workflows and analytics software.

GLIS

The Global Information System (GLIS) on Plant Genetic Resources for Food and Agriculture (PGRFA) of the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA) is a web-based global entry point for users and third-party systems to access information and knowledge on scientific, technical and environmental matters to strengthen PGRFA conservation, management and utilization activities. The system and its portal also enable recipients of PGRFA to make available all non-confidential information on germplasm according to the provisions of the Treaty and facilitates access to the results of their research and development.

Thanks to the adoption of Digital Object Identifiers (DOIs) to PGRFA ex situ and in situ based on the Multi-Crop Passport Descriptors (MCPD), the Portal provides access to 1.7 million PGRFA in collections conserved worldwide. Of these, over 1.5 million are accessible for research, training and plant breeding in the food and agriculture domain.

The Scientific Advisory Committee of the International Treaty and the Governing Body have repeatedly welcomed efforts on interoperability among germplasm information systems. In this context, the GLIS Portal adopted the Breeding API (BrAPI v1) in 2022. Integrating the BrAPI among the GLIS content negotiators facilitates queries and the exchange of content for data management in plant breeding. The Portal also offers other protocols (XML, DarwinCore, JSON and JSON-LD) to increase data and metadata connectivity. In the near future, depending on the availability of resources, upgrading to BrAPI v2 is planned.

FLORILÈGE (Gateway to French Plant Genetic Resources)

Designed primarily for the general public, Florilège provides access to all French plant biological resources centers. Its interface allows individuals to browse available plant accessions and gives them the possibility to order them. The listed accessions originate from 19 resources centers and concern around fifty plant species.

Florilège retrieves accession information from different BrAPI-compliant systems such as OLGA, an internal accessions management system, or FAIDARE. Leveraging the BrAPI implementation of these systems ensures standardized data retrieval from multiple sources, making the integration of new data sources that implement BrAPI an effortless process. The implementation of BrAPI is a prerequisite for the integration of any new database in Florilège.

Florilège is developed in Drupal 10, and uses xnttbrapi module (to easily connect to BrAPI compliant external databases).

BrAPI plug and play GraphQL based data-warehouse

Using the "Zendro" set of automatic software program-code generators (zendro-dev.github.io) a fully functional, efficient, and cloud-capable BrAPI data-warehouse has been created for the current version of the BrAPI data models. The resulting data-warehouse has two interfaces, one application programming interface implemented in the form of a GraphQL web-server and another intuitive point and click graphical user interface in the browser. Both provide secure access to data read and write functions for all BrAPI data models. These data administration methods comprise create, read, update, and delete (CRUD) functions that are standardized and accept the same parameters for all data models.

While data write access comprises both persisting single or multiple records, data read access is particularly rich in features and includes access to single records referred to by their id and access to multiple records selected by logical filters. In this, multiple records are paginated using the highly efficient cursor based pagination model as proposed in the GraphQL standard. Logical filters allow for exhaustive search queries, whose structure is highly intuitive and based around logical triplets in which a data model field is validated using an operator and a value, e.g. "Study name equals 'xyz'". In this a large collection of operators is available and triplets can be combined to logical search trees using "and" or "or" operators. Searches can be extended over relationships between data models, thus enabling a user to query the warehouse exactly for the data wanted.

Access security is implemented with the OAuth2 user authentication standard (datatracker.ietf.org/doc/html/rfc6749). Authorization is based on user roles and can be configured differently for each single data model read or write function.

The browser based graphical user interface is implemented in React.js with Next and exposes an intuitive and self explanatory set of functions for each data model. In the left a menu allows the user to access all BrAPI data models. Upon clicking on a model a table is shown which allows the user to paginate through all existing records, sort them by any column, search the records, add new records, or update or delete existing records, if the user role authorizes these functions. Record data can be inspected in a detail view and here relationships to other data records can be reviewed using the very same graphical visual representations. Breadcrumbs allow the user to navigate back and forth in the trail of relationships inspected. Finally, the generated graphical interface allows for the integration of interactive scientific plots and analysis tools written in JavaScript or WebAssembly.

The Zendro based BrAPI plug and play data-warehouse is capable of forming an efficient cloud of data servers. This is achieved simply by linking (URLs) other Zendro based warehouses that expose the same GraphQL API to the same data models, or a subset of data models. Any network of such Zendro GraphQL servers can be set up using this configuration approach. The code generated then exposes full access to all data records stored on any node of the network, while maintaining full security control at each node. Importantly, the warehouses are programmed in such a way that any number of data servers can be joined without loss of efficiency. Only the network connection speed and size of requested record sets influence the performance.

As explained, Zendro is a code generator and creates a fully functional data warehouse from input data model definitions, i.e. a schema. The schema is given in the form of special data model descriptions, in which each model is defined using JavaScript Object Notation (JSON). Each model is defined in its respective JSON file. A translator has been developed to create the Zendro schema from the BrAPI data model definitions. This ensures that Zendro can create plug and play data warehouses for future versions of the BrAPI with great ease, i.e. by translating the BrAPI models to Zendro input and subsequently running Zendro to create the plug and play warehouse.