This folder contains scripts to generate useful abstractions of raw MIMIC-III data ("concepts"). The scripts are intended to be run against the BigQuery instantiation of MIMIC-III, and are written in the BigQuery Standard SQL dialect. Concepts are categorized into folders if possible, otherwise they remain in the top-level directory. A table of contents is provided below: List of concepts.
You can read about cloud access to MIMIC-III, including via Google BigQuery, on the cloud page.
The rest of this README describes:
You do not need to generate the concepts if you are using BigQuery! They have already been generated for you. If you have access to MIMIC-III on BigQuery, look under physionet-data.mimic_derived
. If you would like to generate the concepts again, for example on your own dataset, you must modify the TARGET_DATASET
variable within the make-concepts.sh script. The script assumes you have installed and configured the Google Cloud SDK.
Go to the concepts_postgres folder, run the postgres-functions.sql and postgres-make-concepts.sql scripts, in that order.
While the SQL scripts here are written in BigQuery's Standard SQL syntax, there are many BigQuery specific functions which do not carry over to PostgreSQL. Nevertheless, with only a few changes, the scripts can be made compatible. In order to generate the concepts on a PostgreSQL database, one must:
- create postgres functions which emulate BigQuery functions
- modify SQL scripts for incompatible syntax
- run the modified SQL scripts and direct the output into tables in the PostgreSQL database
The bash script convert_mimiciii_concepts_bq_to_psql.sh has done most of this for you. To generate concepts in PostgreSQL, simply go to the concepts_postgres folder and run:
\i postgres-functions.sql
\i postgres-make-concepts.sql
You can also read more about building the data within PostgreSQL in the buildmimic/postgres folder.
Folder | Table | Description |
---|---|---|
. | echo_data | Text extracted from echocardiography reports using regular expressions. |
. | code_status | Whether the patient has restrictions on life saving resuscitation. |
comorbidity | elixhauser_ahrq_v37 | Comorbidities in categories proposed by Elixhauser et al. AHRQ produced the mapping. |
comorbidity | elixhauser_ahrq_v37_no_drg | As above, but DRG codes are not used to exclude primary conditions. |
comorbidity | elixhauser_quan | Comorbidities in categories proposed by Elixhauser et al. using an algorithm by Quan et al. |
comorbidity | elixhauser_score_ahrq | An integer score relating comorbid burden to mortality (AHRQ comorbidities). |
comorbidity | elixhauser_score_quan | An integer score relating comorbid burden to mortality (Quan et al. comorbidities). |
demographics | Summary of patient/admission level information such as age, height, weight, etc. | |
demographics | heightweight | Patient height (cm) and weight (kg). |
demographics | icustay_detail | Information relating to each patient ICU stay. |
demographics | icustay_hours | A table with one row per hour a patient is in the ICU. |
demographics | icustay_times | A table with start/stop times for a patient's ICU stay based on the time of their first and last documented heart rate. |
diagnosis | ||
diagnosis | ccs_diagnosis_table_psql | Load ICD-9 to CCS mapping (PostgreSQL only). |
diagnosis | ccs_dx | Load ICD-9 to CCS mapping. |
durations | Start and stop times for administration of various treatments or durations of various phenomena. | |
durations | adenosine_durations | Start and stop times for administration of adenosine. |
durations | arterial_line_durations | Start and stop times for presence of an arterial line. |
durations | central_line_durations | Start and stop times for presence of an central line |
durations | crrt_durations | Start and stop times for continuous renal replacement therapy (CRRT). |
durations | dobutamine_durations | Start and stop times for administration of dobutamine. |
durations | dopamine_durations | Start and stop times for administration of dopamine. |
durations | epinephrine_durations | Start and stop times for administration of epinephrine. |
durations | isuprel_durations | Start and stop times for administration of isuprel. |
durations | milrinone_durations | Start and stop times for administration of milrinone. |
durations | norepinephrine_durations | Start and stop times for administration of norepinephrine. |
durations | phenylephrine_durations | Start and stop times for administration of phenylephrine. |
durations | vasopressin_durations | Start and stop times for administration of vasopressin. |
durations | vasopressor_durations | Start and stop times for administration of vasopressor. |
durations | dobutamine_dose | Dose administered with start/stop times for dobutamine. |
durations | dopamine_dose | Dose administered with start/stop times for dopamine. |
durations | epinephrine_dose | Dose administered with start/stop times for epinephrine. |
durations | neuroblock_dose | Dose administered with start/stop times for neuro blocking agents. |
durations | norepinephrine_dose | Dose administered with start/stop times for norepinephrine. |
durations | phenylephrine_dose | Dose administered with start/stop times for phenylephrine. |
durations | vasopressin_dose | Dose administered with start/stop times for vasopressin. |
durations | ventilation_classification | Classifies patient settings as implying mechanical ventilation. |
durations | ventilation_durations | Start and stop times for mechanical ventilation. |
durations | weight_durations | Start and stop times for daily weight measurements. |
firstday | The first day subfolder contains scripts to summarizes a patient's health on their first ICU day. | |
firstday | blood_gas_first_day | Highest and lowest blood gas values in the first 24 hours of a patient's ICU stay. |
firstday | blood_gas_first_day_arterial | As above, but arterial blood gases only. |
firstday | gcs_first_day | Highest and lowest Glasgow Coma Scale in the first 24 hours of a patient's ICU stay. |
firstday | height_first_day | Median height recorded for the patient in the first 24 hours of a patient's ICU stay. |
firstday | labs_first_day | Highest and lowest laboratory values in the first 24 hours of a patient's ICU stay. |
firstday | rrt_first_day | Presence of renal replacement therapy in the first 24 hours of a patient's ICU stay. |
firstday | urine_output_first_day | Total urine output over the first 24 hours of a patient's ICU stay. |
firstday | ventilation_first_day | Whether the patient was mechanically ventilated in the first 24 hours. |
firstday | vitals_first_day | Highest and lowest vital signs in the first 24 hours of a patient's ICU stay. |
firstday | weight_first_day | Highest and lowest weight measurements in the first 24 hours of a patient's ICU stay. |
fluid_balance | Tables which track fluid input and output for the patient. | |
fluid_balance | colloid_bolus | Times at which a patient received a bolus of colloidal fluid. |
fluid_balance | crystalloid_bolus | Times at which a patient received a bolus of crystalloid fluid. |
fluid_balance | urine_output | Urine output for a patient with the time of documentation. |
organfailure | Summarizations of the degree of organ failure for single organ systems. | |
organfailure | kdigo_creatinine | Creatinine values with baseline creatinine as defined by KDIGO. |
organfailure | kdigo_uo | Urine output over 6, 12, and 24 hour periods. |
organfailure | kdigo_stages | Stages of acute kidney failure (AKI) as defined by KDIGO. |
organfailure | kdigo_stages_48hr | Stages of AKI for the first 48 hours of a patient's ICU stay. |
organfailure | kdigo_stages_7day | Stages of AKI for the first 7 days of a patient's ICU stay. |
organfailure | meld | The MELD score, often used to assess health of liver transplant candidates. |
pivot | Pivoted views contain the patient icustay_id , the charttime , and a number of variables. They are useful to acquiring a time series of values for patient stays. |
|
pivot | pivoted_bg | Blood gas measurements. |
pivot | pivoted_fio2 | Fraction of inspired oxygen. |
pivot | pivoted_gcs | Glasgow Coma Scale. |
pivot | pivoted_height | Height. |
pivot | pivoted_icp | Intracranial pressure. |
pivot | pivoted_invasive_lines | Invasive lines. |
pivot | pivoted_lab | Laboratory values. |
pivot | pivoted_oasis | The Oxford Acute Severity of Illness Score (OASIS). |
pivot | pivoted_rrt | Renal replacement therapy. |
pivot | pivoted_sofa | The Sequential Organ Failure Assessment (SOFA) scale. |
pivot | pivoted_uo | Urine output. |
pivot | pivoted_vent_setting | Ventilator settings (tidal volume, PEEP, etc). |
pivot | pivoted_vital | Vital signs. |
sepsis | Definitions of sepsis, a common cause of mortality for intensive care unit patients. | |
sepsis | angus | Sepsis defined using billing codes validated by Angus et al. |
sepsis | explicit | Explicitly coded sepsis (i.e. a list of patients with ICD-9 codes which refer to sepsis). |
sepsis | martin | Sepsis defined using billing codes validated by Martin et al. (now considered "septicemia"). |
severityscores | Severity of illness scores are defined using the highest/lowest values during the first 24 hours of a patient's stay. | |
severityscores | apsiii | Acute Physiology Score III. |
severityscores | lods | Logistic Organ Dysfunction Score. |
severityscores | mlods | Modified Logistic Organ Dysfunction Score. |
severityscores | oasis | The Oxford Acute Severity of Illness Score (OASIS). |
severityscores | qsofa | The quick Sequential Organ Failure Assessment (qSOFA) scale. |
severityscores | saps | The Simplified Acute Physiology Score (SAPS). |
severityscores | sapsii | SAPS II. |
severityscores | sirs | The Systemic Inflammation Response Score (SIRS). |
severityscores | sofa | The Sequential Organ Failure Assessment (SOFA) scale. |
treatment | Tables associated with treatment of a patient. | |
treatment | abx_prescriptions_list | A list of antibiotics mentioned in the prescriptions table. |
treatment | suspicion_of_infection | Suspicion of infection as defined by antibiotic use near the ordering of blood cultures. |
This is an asortment of scripts intended to give the user more familiarity with the MIMIC-III database. None of these scripts generate materialized views.
Useful snippets of SQL implementing common functions. For example, the auroc.sql
file calculates the area under the receiver operator characteristic curve (AUROC) for a set of predictions, PRED
, given a set of targets, TAR
. The AUROC is a useful measure of the discrimination of a set of predictions.
Scripts in flavours of SQL which are not compatible with BigQuery/PostgreSQL.