Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Metadata, Specimen, Assay, Research Study module pages #53

Merged
merged 5 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions input/pagecontent/module_assays.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
### Assays

The assay model is currently under revision, and does not have a structure that is ready for release at this time. However, we recognize that a subset of assay metadata is necessary to support, even if we do not have a full model finalized. For that reason, assay information that is relevant to the use of specific files can be found in the module information for file metadata.
30 changes: 30 additions & 0 deletions input/pagecontent/module_filemetadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
### File Metadata

File metadata describes information about the experimental or investigative process and input material that is relevant to the creation, discovery, and reuse of the file.

#### Scope and Usage

As a general rule, file metadata should populated for any files which are produced by experiment processes or which are considered "study data".

Specifically scoped file metadata profiles based on file type and input material are available to further clarify which fields within the file metadata module are relevant to particular data types.

##### Released File Profiles

* Bam or Cram file profile
* Gene fusion or gene expression file profile
* FASTQ file profile
* MAF (Somatic Mutation) file profile
* VCF or gVCF file profile

##### In Progress File Profiles or Focus Areas

* Proteomics
* Radiological Imaging (Image Studies)
* Pathology Imaging (Slide Images)
* Metabolomics

### Background

The file metadata model is intended as a mechanism for providing useful and relevant information related to the creation and use of data files. File metadata was created to provide a "home" for important file-related assay information while the assay model is still under revision. Technical metadata such as file size, file location, and hash are not include in file metadata but can be found directly on the file resource.

The file metadata model is intended to be modular and flexible in response to the varying descriptive requirements for different data types. While the overall model contains all fields that may be relevant to any file data type, implementation of the model should be done via profiles, which subset the larger list of file metadata fields into just those that are relevant to the file type in question. This allows for standardization across data types for shared fields while still providing the flexibility to data-type specific fields when needed.
10 changes: 10 additions & 0 deletions input/pagecontent/module_files.md
Original file line number Diff line number Diff line change
@@ -1 +1,11 @@
### Files

The file module is used to describe files used for or created by investigative processes.

#### Scope and Usage

The file entity is used primarily to provide technical information relevant to the management and administration of a file. Metadata about what is contained in the file or how the content was generated should be described with other etities such as data dictionaries, summaries, or specific file metadata profiles.

File contains basic technical metadata about file location, access, and contents. Files are typically associated with one or more participants, but files may also be study documents assocaited to the research study in general.

Files may contain multiple file location references, (such as a DRS link and an s3 bucket URL) though the access approaches for those locations should be reasonably apparent through the Access Policy for the file content. For example, if a data file is ONLY accessible through DRS, the underlying bucket location should not be included as no user would be able to access it directly. However, if there are multiple Access Policies that provide routes to access the data through different URIs, those may be included. For example, if a file is accessible both via controlled access release through DRS, and with a consortium access model permitting direct bucket access, the DRS and bucket URI both be stated here to permit consistent reference to the File irrespective of the access mechanism.
45 changes: 44 additions & 1 deletion input/pagecontent/module_research_study.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,45 @@
### Research Study
TBD

The NCPI Research Study is based upon the core HL7 FHIR ResearchStudy resource (R4) and acts as the umbrella for grouping and describing all other study resources. For the purposes of interoperability, this guide includes recommended practices for the shared data elements required for submission. Please see the research study documentation for in-depth mappings on the R4 version and the necessary extensions needed to ensure interoperability.

#### Added Profile Restrictions

To ensure consistency across all NCPI research studies represented in FHIR, there are some additional requirements which must be enforced. These requirements are derived from the Differential Table section of this document.

##### The following requirements are true for all NCPI Research Studies

* each study should have its accession ID added as an identifier. This is an identifier provided by DbGAP or other organization which represents a common identifier recognized by similar research groups.
* each study should have its study name as the title.
* for those studies which exist as part of a larger study, the parent study should be referenced in the study’s partOf property.
* enrollment must contain 1 reference of type, Study Group.
* category must contain the Coding from NCPI StudyCohort.
* principalInvestigator must be of type Practitioner if present. (Note: we are using practitioner to maintain consistency with existing FHIR structures.)

##### Recommended Practices

To ensure consistency across all NCPI research studies represented in FHIR, there are some additional elements which should be included if applicable to your study. A recommended element is one that is important and will likely have value for those trying to understand the study’s purpose and usefulness but not essential for validation against the profile. Those elements labeled as optional are not central to the fundamental understanding of the study’s content but may play a key role in a study being findable.

[ table related to NCPI shared data elements goes here ]

For a more detailed view of these elements as well as the recommended FHIR mappings please see the research study documentation.

##### Population Details

Each NCPI Research must have one Study Group which must, at the very least, indicate the total number of patients enrolled at the time the data was loaded into FHIR.

Additional Study Groups may be included to describe various aspects of the study’s population.

###### Usage

* Derived from this Resource Profile: Research Study Subject
* Refer to this Resource Profile: Study Summary and Study Variable Summary
* Examples for this Resource Profile: ResearchStudy/cmg-research-study-bhcmg and ResearchStudy/ncpi-research-study-01

Notes:
As mentioned in the section, “Added Profile Restrictions” above, each NCPI Research must have one NCPI Study Group which must, at the very least, indicate the total number of patients enrolled at the time the data was loaded into FHIR.

##### Practices for Summary Only Resources

For Studies loaded into Summary Only FHIR servers, the Study’s Study Group resources must have the quantity. This promotes findability by enabling researchers without current access to the study’s row-level data to get basic study details including the different subject counts.

For studies that exist alongside row-level data, the Study’s Study Group resources should have each corresponding Patient referenced in the Group’s members array.
29 changes: 29 additions & 0 deletions input/pagecontent/module_specimen.md
Original file line number Diff line number Diff line change
@@ -1 +1,30 @@
### Specimen

The specimen model is intended to represent both historical biospecimen and sample information related to derived data files and information about physical biospecimens and samples that may be available for request by researchers.

#### Scope and Usage

The specimen model is intended to represent three meaningful concepts:

* Sample Origins (Collection)
* Sample Processing and Lineage (Sample)
* Sample Storage (Aliquot/Container/Tube)

While most LIMS or specimen management systems track inventory and the aliquot/container level, many researcher questions or use cases exist at the sample or collection level. For example, if a DNA sample is portioned into three tubes (containers), a researcher may not be concerned with which tube was used, but may be interested to know that it is the same DNA (Sample) that was used in a methylation analysis, and that it was derived from a tissue sample (Collection). In cases where samples are made available for request, it is also important to represent not just the individual containers thesmelves, but the grouping of containers (Sample) that represent the total requestable amount of a specimen.

Via these three aspects, this model attempts to represent the real world conceptualization of specimen use in biomedical research.

#### Collection

The collection entity describes the collection procedure that generated the biospecimen. This can contain minimal metadata (for example, just the age at collection), no metadata (if none is known), or robust information including procedure, location, and laterality. Collection details are not required, but when possible, it should be indicated if a sample was the original, collected sample.

#### Sample

The Sample entity is intended to describe groupings of "biologically equivalent" biospecimen or containers. In many cases, this may be an abstraction of real-world concepts, but provide a helpful tool for simpliflying complex data recorded in LIMS systems. For example, a particular collection of blood may be stored in several EDTA tubes. While these tubes will be tracked separately in a LIMS system, a primary researcher may care only that a blood sample exists, and whether or not the total volume is adequate for their research purposes.

Samples can be derived from other samples, or from a collection event. Using the parent-child relationships within Specimen, it is possible to describe detailed a processing lineage, for example, a collection of Whole Blood processed into a sample of White Blood Cells processed into a sample of DNA. Ultimately, however, only the collected biospecimen (if known) and any biospecimen diretly related to analysis outputs MUST be recorded.

#### Aliquot

The Aliquot entity, also referred to as 'container', is intended to describe specific portions of a sample stored in coded and tracked containers, such as tubes. This entity can be used to represent the exact containers present in the real-world sample, or the aliquot entity can be used simply to indicate the total amount of sample available.

Loading