Skip to content

Latest commit

 

History

History
151 lines (127 loc) · 16.3 KB

README.md

File metadata and controls

151 lines (127 loc) · 16.3 KB

MIMIC-III Concepts

This folder contains scripts to generate useful abstractions of raw MIMIC-III data ("concepts"). The scripts are intended to be run against the BigQuery instantiation of MIMIC-III, and are written in the BigQuery Standard SQL dialect. Concepts are categorized into folders if possible, otherwise they remain in the top-level directory. A table of contents is provided below: List of concepts.

You can read about cloud access to MIMIC-III, including via Google BigQuery, on the cloud page.

The rest of this README describes:

Generating the concepts in BigQuery

You do not need to generate the concepts if you are using BigQuery! They have already been generated for you. If you have access to MIMIC-III on BigQuery, look under physionet-data.mimic_derived. If you would like to generate the concepts again, for example on your own dataset, you must modify the TARGET_DATASET variable within the make-concepts.sh script. The script assumes you have installed and configured the Google Cloud SDK.

Generating the concepts in PostgreSQL

Quickstart

Go to the concepts_postgres folder, run the postgres-functions.sql and postgres-make-concepts.sql scripts, in that order.

In more detail

While the SQL scripts here are written in BigQuery's Standard SQL syntax, there are many BigQuery specific functions which do not carry over to PostgreSQL. Nevertheless, with only a few changes, the scripts can be made compatible. In order to generate the concepts on a PostgreSQL database, one must:

  • create postgres functions which emulate BigQuery functions
  • modify SQL scripts for incompatible syntax
  • run the modified SQL scripts and direct the output into tables in the PostgreSQL database

The bash script convert_mimiciii_concepts_bq_to_psql.sh has done most of this for you. To generate concepts in PostgreSQL, simply go to the concepts_postgres folder and run:

\i postgres-functions.sql
\i postgres-make-concepts.sql

You can also read more about building the data within PostgreSQL in the buildmimic/postgres folder.

List of concepts

Folder Table Description
. echo_data Text extracted from echocardiography reports using regular expressions.
. code_status Whether the patient has restrictions on life saving resuscitation.
comorbidity elixhauser_ahrq_v37 Comorbidities in categories proposed by Elixhauser et al. AHRQ produced the mapping.
comorbidity elixhauser_ahrq_v37_no_drg As above, but DRG codes are not used to exclude primary conditions.
comorbidity elixhauser_quan Comorbidities in categories proposed by Elixhauser et al. using an algorithm by Quan et al.
comorbidity elixhauser_score_ahrq An integer score relating comorbid burden to mortality (AHRQ comorbidities).
comorbidity elixhauser_score_quan An integer score relating comorbid burden to mortality (Quan et al. comorbidities).
demographics Summary of patient/admission level information such as age, height, weight, etc.
demographics heightweight Patient height (cm) and weight (kg).
demographics icustay_detail Information relating to each patient ICU stay.
demographics icustay_hours A table with one row per hour a patient is in the ICU.
demographics icustay_times A table with start/stop times for a patient's ICU stay based on the time of their first and last documented heart rate.
diagnosis
diagnosis ccs_diagnosis_table_psql Load ICD-9 to CCS mapping (PostgreSQL only).
diagnosis ccs_dx Load ICD-9 to CCS mapping.
durations Start and stop times for administration of various treatments or durations of various phenomena.
durations adenosine_durations Start and stop times for administration of adenosine.
durations arterial_line_durations Start and stop times for presence of an arterial line.
durations central_line_durations Start and stop times for presence of an central line
durations crrt_durations Start and stop times for continuous renal replacement therapy (CRRT).
durations dobutamine_durations Start and stop times for administration of dobutamine.
durations dopamine_durations Start and stop times for administration of dopamine.
durations epinephrine_durations Start and stop times for administration of epinephrine.
durations isuprel_durations Start and stop times for administration of isuprel.
durations milrinone_durations Start and stop times for administration of milrinone.
durations norepinephrine_durations Start and stop times for administration of norepinephrine.
durations phenylephrine_durations Start and stop times for administration of phenylephrine.
durations vasopressin_durations Start and stop times for administration of vasopressin.
durations vasopressor_durations Start and stop times for administration of vasopressor.
durations dobutamine_dose Dose administered with start/stop times for dobutamine.
durations dopamine_dose Dose administered with start/stop times for dopamine.
durations epinephrine_dose Dose administered with start/stop times for epinephrine.
durations neuroblock_dose Dose administered with start/stop times for neuro blocking agents.
durations norepinephrine_dose Dose administered with start/stop times for norepinephrine.
durations phenylephrine_dose Dose administered with start/stop times for phenylephrine.
durations vasopressin_dose Dose administered with start/stop times for vasopressin.
durations ventilation_classification Classifies patient settings as implying mechanical ventilation.
durations ventilation_durations Start and stop times for mechanical ventilation.
durations weight_durations Start and stop times for daily weight measurements.
firstday The first day subfolder contains scripts to summarizes a patient's health on their first ICU day.
firstday blood_gas_first_day Highest and lowest blood gas values in the first 24 hours of a patient's ICU stay.
firstday blood_gas_first_day_arterial As above, but arterial blood gases only.
firstday gcs_first_day Highest and lowest Glasgow Coma Scale in the first 24 hours of a patient's ICU stay.
firstday height_first_day Median height recorded for the patient in the first 24 hours of a patient's ICU stay.
firstday labs_first_day Highest and lowest laboratory values in the first 24 hours of a patient's ICU stay.
firstday rrt_first_day Presence of renal replacement therapy in the first 24 hours of a patient's ICU stay.
firstday urine_output_first_day Total urine output over the first 24 hours of a patient's ICU stay.
firstday ventilation_first_day Whether the patient was mechanically ventilated in the first 24 hours.
firstday vitals_first_day Highest and lowest vital signs in the first 24 hours of a patient's ICU stay.
firstday weight_first_day Highest and lowest weight measurements in the first 24 hours of a patient's ICU stay.
fluid_balance Tables which track fluid input and output for the patient.
fluid_balance colloid_bolus Times at which a patient received a bolus of colloidal fluid.
fluid_balance crystalloid_bolus Times at which a patient received a bolus of crystalloid fluid.
fluid_balance urine_output Urine output for a patient with the time of documentation.
organfailure Summarizations of the degree of organ failure for single organ systems.
organfailure kdigo_creatinine Creatinine values with baseline creatinine as defined by KDIGO.
organfailure kdigo_uo Urine output over 6, 12, and 24 hour periods.
organfailure kdigo_stages Stages of acute kidney failure (AKI) as defined by KDIGO.
organfailure kdigo_stages_48hr Stages of AKI for the first 48 hours of a patient's ICU stay.
organfailure kdigo_stages_7day Stages of AKI for the first 7 days of a patient's ICU stay.
organfailure meld The MELD score, often used to assess health of liver transplant candidates.
pivot Pivoted views contain the patient icustay_id, the charttime, and a number of variables. They are useful to acquiring a time series of values for patient stays.
pivot pivoted_bg Blood gas measurements.
pivot pivoted_fio2 Fraction of inspired oxygen.
pivot pivoted_gcs Glasgow Coma Scale.
pivot pivoted_height Height.
pivot pivoted_icp Intracranial pressure.
pivot pivoted_invasive_lines Invasive lines.
pivot pivoted_lab Laboratory values.
pivot pivoted_oasis The Oxford Acute Severity of Illness Score (OASIS).
pivot pivoted_rrt Renal replacement therapy.
pivot pivoted_sofa The Sequential Organ Failure Assessment (SOFA) scale.
pivot pivoted_uo Urine output.
pivot pivoted_vent_setting Ventilator settings (tidal volume, PEEP, etc).
pivot pivoted_vital Vital signs.
sepsis Definitions of sepsis, a common cause of mortality for intensive care unit patients.
sepsis angus Sepsis defined using billing codes validated by Angus et al.
sepsis explicit Explicitly coded sepsis (i.e. a list of patients with ICD-9 codes which refer to sepsis).
sepsis martin Sepsis defined using billing codes validated by Martin et al. (now considered "septicemia").
severityscores Severity of illness scores are defined using the highest/lowest values during the first 24 hours of a patient's stay.
severityscores apsiii Acute Physiology Score III.
severityscores lods Logistic Organ Dysfunction Score.
severityscores mlods Modified Logistic Organ Dysfunction Score.
severityscores oasis The Oxford Acute Severity of Illness Score (OASIS).
severityscores qsofa The quick Sequential Organ Failure Assessment (qSOFA) scale.
severityscores saps The Simplified Acute Physiology Score (SAPS).
severityscores sapsii SAPS II.
severityscores sirs The Systemic Inflammation Response Score (SIRS).
severityscores sofa The Sequential Organ Failure Assessment (SOFA) scale.
treatment Tables associated with treatment of a patient.
treatment abx_prescriptions_list A list of antibiotics mentioned in the prescriptions table.
treatment suspicion_of_infection Suspicion of infection as defined by antibiotic use near the ordering of blood cultures.

Other scripts present

cookbook

This is an asortment of scripts intended to give the user more familiarity with the MIMIC-III database. None of these scripts generate materialized views.

functions

Useful snippets of SQL implementing common functions. For example, the auroc.sql file calculates the area under the receiver operator characteristic curve (AUROC) for a set of predictions, PRED, given a set of targets, TAR. The AUROC is a useful measure of the discrimination of a set of predictions.

other-languages

Scripts in flavours of SQL which are not compatible with BigQuery/PostgreSQL.