The code included in this repository was used to conduct the simulation study described in the presentation at the Epidemiology Congress of Americas 2016 (Miami, Florida, USA). The simulation study compared matching weights (Li et al 2013; Li et al in Pan & Bai 2015), three-way matching (Rassen et al 2013), and inverse probability of treatment weighting (Robins et al 2000) in three-level categorical point exposure setting. The corresponding online-first manuscript is available at Epidemiology. Please email me at [email protected] if you have difficulty obtaining the paper. A tutorial for using matching weights in an empirical study is at the RPubs. Franklin et al is another simulation study on propensity score methods including matching weights. A recent example of application of matching weights can be found in Sauer et al.
*.R
: Main R script files for generating simulation data, analyzing data, and reporting results. Execution each file will generate a plain text report file named *.R.txt. Only03.Report.R.txt
is kept in this repository.*_Lsf.sh
: Example parallelization shell scripts for the Linux LSF batch job system. These are designed for Harvard Medical School’s Orchestra cluster specifically, and are not expected to work without modification elsewhere.*_Slurm.sh
: Example parallelization shell scripts for the Linux SLURM batch job system. These are designed for Harvard University’s Odyssey cluster specifically, and are not expected to work without modification elsewhere.data/
: Folder for simulation data. Due to file size issues, only the analysis result files sufficient for running03.Report.R
is kept.figures/
: Folder for figure PDFs generatd by reporting scripts.function_definitions/
: Folder for R function definitions used by the main R scripts.rassen_toolbox/
: Folder for Rassen et al’s Pharmacoepidemiology Toolbox.rassen_toolbox/java/pharmacoepi.jar
is required for three-way matching.
The scripts were written on a macOS system, and for the most part executed on Linux high-performance cluster systems. The execution is computationally intensive, thus, parallelized execution on a computer cluster system is required, particularly for the bootstrapping part.
The following code should install packages that are required by the simulation study.
Rscript ./00.InstallDependencies.R
The simulated dataset must be generated first. The following generates 48 scenario data files (e.g., Scenario001_R1000.RData
) having 1000 iterations each in the data subfolder.
Rscript ./01.DataGenerator.R
Matching weights, three-way matching, and inverse probability of treatment weights-based analyses are conducted by specifying the scenario data file. Analysis must be invoked on one file at a time. For example, for invocation on the first scenario file, use the following code.
Rscript ./02.RunSimulation.R ./data/Scenario001_R1000.RData
This process can be parallelized by distributing the simulation job on each file to a node in a large computer cluster. Example batch files for the LSF and SLURM job dispatch systems are included. Modification of these script to your local cluster system is necessary before they are of any use.
./02.RunSimulation_Lsf.sh ./data/Scenario*
./02.RunSimulation_Slurm.sh ./data/Scenario*
After conducting analyses on all scenarios, the following script can be used to generate reporting. The figures are generated in the figures folder.
Rscript ./03.Report.R
Bootstrapping within a simulation study is a highly computationally intensive task. Thus, this part is kept separate from the rest of the simulation. The script is designed to work on the one tenth of each scenario at a time. For example, for the first part of the first scenario, execute the first line. For the last part of the first scenario, execute the second line.
Rscript ./04.Bootstrap.R ./data/Scenario001_R1000.RData 1
Rscript ./04.Bootstrap.R ./data/Scenario001_R1000.RData 10
This process can also be parallelized. Each one tenth of each scenario is dispatched to a separate node using the following scripts. Again modification of these shell scripts to your local cluster system is necessary before they are of any use.
./04.Bootstrap_Lsf.sh ./data/Scenario*
./04.Bootstrap_Slurm.sh ./data/Scenario*
After conducting bootstrapping on all scenarios, the following script can be used to generate reporting. The figure is generated in the figures folder.
Rscript ./05.BootstrapReport.R
- 2017-02-06: Add online-first article link and additional reading links.
- 2016-07-26: Add manuscript status and tutorial link
- 2016-07-16: Initial upload