feat: mass spectrometry data support #18

cmdoret · 2024-10-07T14:04:48Z

This is the first part of adding support for proteomics/metabolomics metadata into the schema.
This PR focuses on adding required fields into the metadata schema.
In a second part, we will implement parsers in modos-api to auto-populate the zarr metadata by extracting from metabolomics data files.

Important changes are in src/modos_schema/schema/modos_schema.yaml

Context.

We want to support mass spectrometry results from proteomics / metabolomics while keeping the schema as simple as possible.
The starting point is the mzTab format (specs here), a tabular format consisting of one metadata section followed by several tables

The data is basically a table of quantification of molecules present in a collection of samples.

Here is an example of fields extracted from a metadata section (MTD):

MTD assay[0]
MTD assay[0]-ms_run_ref
MTD assay[0]-sample_ref
MTD custom[0]
MTD cv[0]-full_name
MTD cv[0]-label
MTD cv[0]-uri
MTD cv[0]-version
MTD database[0]
MTD database[0]-prefix
MTD database[0]-uri
MTD database[0]-version
MTD id_confidence_measure[0]
MTD instrument[0]-name
MTD instrument[0]-source
MTD ms_run[0]-format
MTD ms_run[0]-id_format
MTD ms_run[0]-location
MTD ms_run[0]-scan_polarity[0]
MTD mzTab-ID
MTD mzTab-version
MTD quantification_method
MTD sample[0]
MTD sample[0]-description
MTD small_molecule_feature-quantification_unit
MTD small_molecule-identification_reliability
MTD small_molecule-quantification_unit
MTD software[0]
MTD study_variable[0]
MTD study_variable[0]_refs
MTD title

We can already represent the following fields:

DataEntity:
- id (mzTab-ID)
- name (title)
Assay
- id (asssay[n])
- has_sample (assay[0]-sample_ref)
Sample
- id (sample[n])
- description (sample[n]-description)
- taxon_id (sample[n]-species)
- cell_type (sample[n]-cell_type)
- source_material (sample[n]-tissue)

Note: even if we don't include some fields in the metadata schema (e.g. mzTab-version), they would still be retrievable from the file itself through the API.

Changes

Based on @htmonkey's suggestions (sdsc-ordes/modos-api#91 (comment)), it seems we at least needed these changes:

Adds MassSpectrometryResults, a subclass of DataEntity for mass spectrometry quantification results.
Adds has_sample_processing on Assay (MTD sample_processing[n])

Challenges / questions

1. In MODOS, we traditionally have MODOS -(has)-> Assay -(has)-> DataEntity, and samples can be attached to Assay and/or DataEntity.

A single mzTab file can contain hundreds of samples and assays, each of which is a single line in the table.
I am not sure if this is an issue, but this will bloat the metadata with a lot of redundancy.

2. We will likely need to add properties to the schema to represent mzTab files in a meaningful way.

It is not clear sometimes on which class a property should be added; in this case, the MTD sample_processing[n] property seems tied to the entire mzTab file and not to an individual sample. Does that mean that all assays and samples in the file must have the same sample processing? @htmonkey
An alternative would be to attach the sample_processing directly to MassSpectroMetryResult to reduce redundancy.

project/shacl/modos_schema.shacl.ttl

github-actions · 2024-11-08T08:58:17Z

PR Preview Action v1.4.8
🚀 Deployed preview to https://sdsc-ordes.github.io/modos-schema/pr-preview/pr-18/
on branch `gh-pages` at 2024-11-08 08:58 UTC

cmdoret requested a review from rmfranken October 7, 2024 14:04

cmdoret self-assigned this Oct 7, 2024

rmfranken reviewed Oct 10, 2024

View reviewed changes

project/shacl/modos_schema.shacl.ttl Show resolved Hide resolved

cmdoret added 6 commits November 8, 2024 09:56

feat: add data entity for mass spectrometry

7f9b859

chore: regen

10a5967

fix: typo in description

46fcc46

refactor: consistent slot naming

f052f9b

chore: regen

4a41e86

chore: regen

7bd1e50

cmdoret force-pushed the feat/metabolomics branch from 26fb3e8 to 7bd1e50 Compare November 8, 2024 08:57

cmdoret merged commit b231fae into main Nov 28, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mass spectrometry data support #18

feat: mass spectrometry data support #18

cmdoret commented Oct 7, 2024 •

edited

Loading

github-actions bot commented Nov 8, 2024

feat: mass spectrometry data support #18

feat: mass spectrometry data support #18

Conversation

cmdoret commented Oct 7, 2024 • edited Loading

Context.

Changes

Challenges / questions

github-actions bot commented Nov 8, 2024

cmdoret commented Oct 7, 2024 •

edited

Loading