Skip to content

Commit

Permalink
Update INS-466 feature: update about page text to be included in the …
Browse files Browse the repository at this point in the history
…global search (#54) (#55)
  • Loading branch information
David-YuWei authored Nov 17, 2022
1 parent e84ae2f commit 6749fe4
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions dataloader/model-desc/aboutPagesContent.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
- page: '/about'
title: "About"
content:
- paragraph: "GATHERING RESEARCH OUTPUTS: AN CONTINUOUS CHALLENGE"
- paragraph: "The National Cancer Institute (NCI) is excited to release initial phase of the Index of NCI Studies (INS) to tackle the challenge of assembling grants information from various publicly available resources into one place. Building connections between NCI programs, grants, and outputs is a complex task usually addressed through manual curation by portfolio analysis experts. As we work to automate the process and share this resource with the public, we would like to inform you the known limitations within the data gathering process and information displayed by the INS. We are working on improving data gathering process and will periodically release updates which allows us to iteratively improve the INS site."
- paragraph: "Funding Methods: This initial phase of the INS only gathers outputs from extramural grants . Other funding sources such as intramural grants and contracts will be addressed in the future."
- paragraph: "Initial Program Scope: This initial phase of the INS only gathers outputs from grants funded under two programs: the Cancer MoonshotSM and the Childhood Cancer Data Initiative (CCDI). Though grouping projects under “programs” is useful for organizational purposes, the program-grant link is not always clearly defined or publicly available. A list of projects associated with the Cancer Moonshot was obtained from the U.S. Department of Health and Human Services (HHS) Tracking Accountability in Government Grants System , while a list of projects associated with CCDI was provided by the NCI’s Office of Data Sharing. These lists were used as input into our automated data gathering process outlined below."
- paragraph: "Data Organization: The Moonshot program is not currently subdivided into Moonshot initiatives within the INS. Projects are currently organized within the INS by the full grant number (e.g., “1U24CA224319-01”), which means that a single core project ID (e.g., “UC24CA224319”) may be listed several times with multiple funding years. However, because researchers usually cite core project IDs (rather than full grant numbers) in their published outputs, the INS groups data outputs by core project ID. This leads to some known false positive results where outputs associated with a core project ID are associated with a full grant number when they should not be. For instance, a full grant number for a particular year could show a false association with a publication published before the grant was awarded because of a true association between that publication and an earlier grant with the same core project ID. This discrepancy is being evaluated for improvement in a future release."
- paragraph: "Data Sourcing: The INS relies on several public resources to connect and enrich our data, which comes with the challenge of interoperability and inherent risk of propagating source errors. Details on projects (using full grant numbers) are obtained from the NIH RePORTER resource. The information linking projects to a Division/Office/Center (DOC) are obtained from administrative databases, which may contain inconsistencies where the link between Program Officers, DOCs, and projects may change over the years. Publications, Clinical Trials, and Patents are obtained by automated searching of core project IDs against public resources: PubMed , ClinicalTrials.gov , and US Patent and Trademark Office , respectively. Publication information is further enriched with metrics obtained from NIH iCite . Dataset information is obtained by indirectly linking datasets to core project IDs through the publications citing them. The initial phase of the INS is only gathering datasets from three repositories: database of Genotypes and Phenotypes (dbGaP) , Gene Expression Omnibus (GEO) , and Sequence Read Archive (SRA)."
- paragraph: "Supplemental Grants: Supplemental grants (specifically, P30 supplements) and their outputs are not included in the initial phase of the INS. Supplements pose a particular challenge to the INS, as researchers rarely cite supplements in published outputs. It is very difficult to differentiate outputs associated with a supplement from those associated with a parent grant, even with expert manual curation. Often, researchers themselves may not be able to differentiate outputs of supplemental grants separately from those of parent grants. We intend to include a curated subset of supplements and outputs where possible in future releases."
- paragraph: "The Index of NCI Studies (INS) gathers and displays information about research artifacts (publications, data, clinical trials, and patents) generated by NCI-supported grants in a single site to facilitate research portfolio analysis. The pilot phase focuses on extramural grants from the Cancer Moonshot program and the Childhood Cancer Data Initiative (CCDI). The INS site provides detailed information for each program and project, as well as the research outputs generated by each project, which can be filtered according to the user’s specific interests."
- paragraph: "The goal of the INS is to enable access to research outputs generated by NCI-supported grants at a single location. The INS focuses on provided research outputs such as publications, datasets, or patents from various extramural grant funding. The INS obtains award information from key grant source systems such as NIH RePORTER to create the universe of known NCI-supported studies. The INS site displays information extracted from resources that hold publications (PubMed), data (specifically, the database of Genotypes and Phenotypes [dbGaP], Gene Expression Omnibus [GEO], and Sequence Read Archive [SRA] ), clinical trials, and patents (US Patent and Trademark Office [USPTO])."
- paragraph: "The INS is piloting its data gathering process with two NCI-funded programs: the Cancer Moonshot program and the Childhood Cancer Data Initiative (CCDI). Each program consists of a list of awards. General award information—such as project title, principal investigators, award amount, and award start and end dates—is obtained from NIH RePORTER.1 Research outputs from these projects are then obtained by website data gathering processes. The INS data gathering process first queries PubMed with the list of project IDs, as authors typically list their funding sources in PubMed's Grant Support section. The list of publications associated with those project IDs is collected and stored along with metadata on each publication. The data gathering process also collects datasets or clinical trials listed in PubMed's Related Information section. The INS data gathering process queries ClinicalTrials.gov and the USPTO websites independently with the list of project IDs. All research outputs are thus linked to the project ID(s) that produced them, resulting in a coherent data model that links programs to projects to outputs."
Expand Down

0 comments on commit 6749fe4

Please sign in to comment.