Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mass spectrometry data support #18

Merged
merged 6 commits into from
Nov 28, 2024
Merged

feat: mass spectrometry data support #18

merged 6 commits into from
Nov 28, 2024

Conversation

cmdoret
Copy link
Member

@cmdoret cmdoret commented Oct 7, 2024

This is the first part of adding support for proteomics/metabolomics metadata into the schema.
This PR focuses on adding required fields into the metadata schema.
In a second part, we will implement parsers in modos-api to auto-populate the zarr metadata by extracting from metabolomics data files.

Important changes are in src/modos_schema/schema/modos_schema.yaml

Context.

We want to support mass spectrometry results from proteomics / metabolomics while keeping the schema as simple as possible.
The starting point is the mzTab format (specs here), a tabular format consisting of one metadata section followed by several tables

The data is basically a table of quantification of molecules present in a collection of samples.

Here is an example of fields extracted from a metadata section (MTD):

MTD assay[0]
MTD assay[0]-ms_run_ref
MTD assay[0]-sample_ref
MTD custom[0]
MTD cv[0]-full_name
MTD cv[0]-label
MTD cv[0]-uri
MTD cv[0]-version
MTD database[0]
MTD database[0]-prefix
MTD database[0]-uri
MTD database[0]-version
MTD id_confidence_measure[0]
MTD instrument[0]-name
MTD instrument[0]-source
MTD ms_run[0]-format
MTD ms_run[0]-id_format
MTD ms_run[0]-location
MTD ms_run[0]-scan_polarity[0]
MTD mzTab-ID
MTD mzTab-version
MTD quantification_method
MTD sample[0]
MTD sample[0]-description
MTD small_molecule_feature-quantification_unit
MTD small_molecule-identification_reliability
MTD small_molecule-quantification_unit
MTD software[0]
MTD study_variable[0]
MTD study_variable[0]_refs
MTD title

We can already represent the following fields:

  • DataEntity:
    • id (mzTab-ID)
    • name (title)
  • Assay
    • id (asssay[n])
    • has_sample (assay[0]-sample_ref)
  • Sample
    • id (sample[n])
    • description (sample[n]-description)
    • taxon_id (sample[n]-species)
    • cell_type (sample[n]-cell_type)
    • source_material (sample[n]-tissue)

Note: even if we don't include some fields in the metadata schema (e.g. mzTab-version), they would still be retrievable from the file itself through the API.

Changes

Based on @htmonkey's suggestions (sdsc-ordes/modos-api#91 (comment)), it seems we at least needed these changes:

  • Adds MassSpectrometryResults, a subclass of DataEntity for mass spectrometry quantification results.
  • Adds has_sample_processing on Assay (MTD sample_processing[n])

Challenges / questions

1. In MODOS, we traditionally have MODOS -(has)-> Assay -(has)-> DataEntity, and samples can be attached to Assay and/or DataEntity.

A single mzTab file can contain hundreds of samples and assays, each of which is a single line in the table.
I am not sure if this is an issue, but this will bloat the metadata with a lot of redundancy.

2. We will likely need to add properties to the schema to represent mzTab files in a meaningful way.

It is not clear sometimes on which class a property should be added; in this case, the MTD sample_processing[n] property seems tied to the entire mzTab file and not to an individual sample. Does that mean that all assays and samples in the file must have the same sample processing? @htmonkey
An alternative would be to attach the sample_processing directly to MassSpectroMetryResult to reduce redundancy.

@cmdoret cmdoret requested a review from rmfranken October 7, 2024 14:04
@cmdoret cmdoret self-assigned this Oct 7, 2024
Copy link

github-actions bot commented Nov 8, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://sdsc-ordes.github.io/modos-schema/pr-preview/pr-18/
on branch gh-pages at 2024-11-08 08:58 UTC

@cmdoret cmdoret merged commit b231fae into main Nov 28, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants