Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing processed AIRR sequences in MiAIRR Data Elements table #803

Closed
ustervbo opened this issue Oct 6, 2024 · 1 comment
Closed

Missing processed AIRR sequences in MiAIRR Data Elements table #803

ustervbo opened this issue Oct 6, 2024 · 1 comment

Comments

@ustervbo
Copy link
Contributor

ustervbo commented Oct 6, 2024

In the section 'MiAIRR Data Elements', we write:

The AIRR Community has agreed to six high-level data sets that will guide the publication, curation, and sharing of AIRR-Seq data and metadata: Study and subject, sample collection, sample processing and sequencing, raw sequences, processing of sequence data, and processed AIRR sequences.

However, the table shown has only five sets. The last element of the table is 5 / data (processed sequence), which hints at a flawed numbering. The V(D)J germline reference database arguably belongs to 5 / process (computational).

What's missing from the table is the processed AIRR sequences. The reason seems to be a missing Rearrangement from:

tables = ['Study', 'Subject', 'Diagnosis', 'Sample', 'CellProcessing', 'NucleicAcidProcessing',
          'PCRTarget', 'SequencingRun', 'SequencingData', 'DataProcessing']

in conf.py. Looking at the schema, adding Rearrangement will add v_call, d_call, j_call, c_call, junction, junction_aa, duplicate_count, and cell_id to the table under set/section: 6 / data (processed sequence).

Additionally, germline_database in the schema should be modified so that subset: data (processed sequence) becomes subset: process (computational).

The opening paragraph could be rewritten to include the style of MiAIRR-to-NCBI Implementation:

The AIRR Community has agreed to six high-level data sets that will guide the publication, curation, and sharing of AIRR-Seq data and metadata. The levels are:

1. Study, subject, and diagnosis and intervention
2. Sample collection
3. Sample processing and sequencing
4. Raw sequencing data
5. Data processing
6. Processed sequences with a basic analysis results
@ustervbo
Copy link
Contributor Author

Duplicate of #795

@ustervbo ustervbo marked this as a duplicate of #795 Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant