Skip to content

Open Health Data at Carolina

karafecho edited this page Nov 4, 2024 · 2 revisions

URL: to be added


Open Health Data @ Carolina provides access to counts and frequencies (i.e., EHR prevalence) of conditions, procedures, drug exposures, and patient demographics, and the co-occurrence frequencies between them. Count and frequency data were derived from UNC Health's OMOP database on a five-year cohort (~6M patients over years 2018 through 2022) of all UNC Health patients, including their inpatient and outpatient visit data. Counts represent the number of patients associated with a given concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Frequencies are the number of unique patients associated with the concept divided by the total number of patients in the dataset, i.e., prevalence in the electronic health records. To protect patient privacy, all concepts and pairs of concepts where the count <= 10 were excluded, and counts were randomized by the Poisson distribution.

The counts for each concept include the patients from all descendant concepts. For example, the count for ibuprofen (ID 1177480) includes patients with Ibuprofen 600 MG Oral Tablet (ID 19019073 patients), Ibuprofen 400 MG Oral Tablet (ID 19019072), Ibuprofen 20 MG/ML Oral Suspension (ID 19019050), etc. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model.

Example edge (interpretation): to be added

Data source(s): Open Health Data @ Carolina exposes data from UNC Health's OMOP database on a five-year cohort (~6M patients over years 2018 through 2022) of all UNC Health patients, including their inpatient and outpatient visit data.

Key methodologic metrics: Open Health Data @ Carolina provides the following key metrics and their statistical measures of association captured inside of biolink:StudyResult structures:

  • Raw counts of each concept and concept pair co-occurrence - biolink:ConceptCountAnalysisResult
  • Chi-squared statistic (Bonferonni adjusted p-value) - biolink:ChiSquaredAnalysisResult
  • Relative frequency (99% confidence interval) - biolink:RelativeFrequencyAnalysisResult
  • Observed-expected frequency ratio (99% confidence interval) - biolink:ObservedExpectedFrequencyAnalysisResult
  • Odds ratio and log odds ratio (95% confidence interval)
  • Total sample size
  • Scores: The score is essentially based off the values for the attribute with 'attribute_type_id': 'biolink:ln_ratio_confidence_interval' . This attribute has a list of 2 numbers as its value (lower and upper bounds of the confidence interval), and for the COHD API, we use the lower bound if the values are positive, but the upper bound if the values are negative (the bound of the CI closer to 0). For example, if the values are [5.861265273152199, 8.386993917460455], then the score would be 5.861265273152199.

Regulatory requirement(s) and/or licensing restriction(s): Service is compliant with all federal and institutional regulations.

Additional resources:

  • OMOP Common Data Model
  • Athena (OMOP vocabularies, search, concept relationships, concept hierarchy)
  • Atlas (OMOP vocabularies, search, concept relationships, concept hierarchy, concept sets)
Clone this wiki locally