Transcriptomics metadata template #75

JolandaS · 2020-03-04T11:23:52Z

Determine which ontologies to use for transcriptomics data (meta data templates)

PeterWoollard · 2020-03-11T08:37:34Z

Key transcriptomics related entities for FAIR and some ontologies include

Key searching ontologies

Species - NCBI taxonomy Scientific name + ID
Tissue - Uberon term and ID
Cell type - CL term and id
Disease - no single solution? Mondo? DO? MeSH
Phenotype/Trait - in humans typically HPO, other mammals: MPO, beyond mammals?
Experiment Type e.g. RNASeq, CITESeq etc. - EFO

Key searching entities (not ontologies)

gene/protein - one of ENSEMBL/ENTREZ_GENE/UNIPROT/HGNC ID + HGN
compound - unichem(Chebi etc,) + SMILE?
metabolites -?

JolandaS · 2020-03-11T09:28:27Z

Define own minimal set of metadata, recommendations. Selection criteria for ontologies used.

daniwelter · 2020-03-11T10:06:54Z

For disease, I would use MONDO (possibly supplemented with NCIt for cancers) as it is currently the most actively developed, so most likely to respond quickly to any change requests. I definitely wouldn't use MeSH. Agreed on all the other ontologies. I'd also add

Cell location/cycle - GO
Developmental stage - HSAPDV/Uberon
chemical compounds - ChEBI

Searching entities
Again, agreed on most of the suggestions.
Metabolites - MetaboLights compound accession, ChEBI

AlasdairGray · 2020-03-11T10:30:08Z

Define own minimal set of metadata, recommendations. Selection criteria for ontologies used.

Bioschema's may be an appropriate approach here to define a minimal metadata record that would be searchable on the web.

karsten-quast · 2020-04-01T07:32:24Z

I tried to compile a potential starting point for a recipe. Hope it makes sense to you. Really looking forward to your thoughts. Maybe we can flesh this out.

Task

Generate metadata template for bulk NGS data generated at different sources following different standards

Define competency questions

What are the questions you would like to address with the template?

Defining Minimal Set Of Metadata (MSOM) according to these questions

Compile metadata from different sources
Generate consolidated view on metadata by merging attributes as far as possible
Differentiate metadata available for most of the studies from metadata occurring rarely (sparse matrix)
Identify gaps in the metadata available for most of the studies comprising data that is considered import but has not been captured in the past
Define a MSOM to be captured in the future from the metadata that is available for most of the studies and the metadata considered to be important
Identify available community standards regarding minimal sets of metadata
Add metadata attributes from those community standards to the MSOM, if they are not included, yet
Assign cardinality to the MSOM (identify mandatory metadata and how many times the attributes may be reported. Some metadata might not be mandatory but are still important to capture, if available)
Identify appropriate ontologies representing your data and establish an application ontology (see recipe 4 of UC3)
Assign, as far as possible, ontologies to the MSOM and the sparse matrix

Introducing semantics into the template

Identify most important objects to be represented in the model (e.g. study, sample, treatment, result, etc.)
Make sure to have an appropriate naming for the objects (e.g. an NGSstudy is an OMICSstudy is a Study; do not call an NGSstudy a Study; make sure the granularity fits your purposes)
Assign MSOM and sparse matrix attributes to the respective objects
Identify and introduce relationships among the identified objects (e.g. “an NGSstudy contains samples”, “a result is derived from a sample”)
Identify dependencies to data not represented as objects at this point in time, but, e.g. as termlists
Make sure that your model can be expanded subsequently to represent those data as objects, as well
Integrate the sparse matrix of metadata not contained in the MSOM in the model

Reality check

Introduce measures allowing identifying errors in reported data according to your model
Expose your model to actual data delivered by independent colleagues and capture the errors and gaps that occurred
Identify errors and gaps that are related to the model and not occurring due to errors in the data
Adjust the model according to these errors and gaps
Re-iterate the reality check until no more severe errors and gaps are occurring that are relevant for the previously defined competency questions

FuqiX · 2020-04-28T14:33:44Z

Link to recipe
https://hackmd.io/@7GH6ArIbRnm_7fgcv8mmWw/HJVQ7nHKL

Chris-Evelo · 2020-05-20T08:16:26Z

I think this would benefit from some structure for an actual study that involves transcriptomics data. Apart from general metadata (who did it, where, where was it stored and so on), this should have a description of the study (which includes what other measurements were done in the same study), this should follow the ISA principles. How samples were created and how the actual measurements were performed. Next, it should also link (and have an ontological description) of 1) parallel measurements (like did you also do proteomics and where do I find that info). 2) phenotypic outcome data. Like under the treatment in the study the data that was measured was blood pressure and so on, and again where you would store that. Note that, ideally, in a public study, the ISA types of data would go into Biosamples, and the other measurements would be in Biostudies, or (for other comics data) be linked from there. So our choices should ideally align with how these repositories (and of course Arrayexpress and GEO) work. (Sorry if all that was already in the cookbook)

Chris-Evelo · 2020-05-20T08:36:02Z

We had some discussion about whether this could not better be part of the catalogue model. Of course, the catalog needs to align with how data is collected. But we need to also make sure of our recipes align with a "FAIR at source" approach where people can start to collect the relevant data when they design, perform and evaluate the actual study.

JolandaS added the UC9 label Mar 4, 2020

JolandaS added this to the F2F Barcelona milestone Mar 4, 2020

JolandaS assigned daniwelter and PeterWoollard Mar 4, 2020

JolandaS mentioned this issue Mar 4, 2020

Maintainance data type specific metadata templates #76

Closed

JolandaS assigned karsten-quast Mar 11, 2020

JolandaS mentioned this issue Mar 19, 2020

Define minimal metadata templates for different levels #79

Open

JolandaS removed this from the Virtual meeting End of April 2020 milestone May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcriptomics metadata template #75

Transcriptomics metadata template #75

JolandaS commented Mar 4, 2020

PeterWoollard commented Mar 11, 2020 •

edited

Loading

JolandaS commented Mar 11, 2020

daniwelter commented Mar 11, 2020

AlasdairGray commented Mar 11, 2020

karsten-quast commented Apr 1, 2020 •

edited

Loading

FuqiX commented Apr 28, 2020

Chris-Evelo commented May 20, 2020 •

edited

Loading

Chris-Evelo commented May 20, 2020

Transcriptomics metadata template #75

Transcriptomics metadata template #75

Comments

JolandaS commented Mar 4, 2020

PeterWoollard commented Mar 11, 2020 • edited Loading

JolandaS commented Mar 11, 2020

daniwelter commented Mar 11, 2020

AlasdairGray commented Mar 11, 2020

karsten-quast commented Apr 1, 2020 • edited Loading

FuqiX commented Apr 28, 2020

Chris-Evelo commented May 20, 2020 • edited Loading

Chris-Evelo commented May 20, 2020

PeterWoollard commented Mar 11, 2020 •

edited

Loading

karsten-quast commented Apr 1, 2020 •

edited

Loading

Chris-Evelo commented May 20, 2020 •

edited

Loading