Skip to content
This repository has been archived by the owner on Nov 16, 2024. It is now read-only.

Materials for the workshop "Targeted Learning: Bridging Machine Learning with Causal and Statistical Inference" as part of the Harvard Catalyst program in November 2024

License

Notifications You must be signed in to change notification settings

tlverse/catalyst2024-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instructors

Schedule

  • 8:30-9:00am: Registration and introductions
  • 9:00am-12:15pm: Introductory topics, with coffee break at ~10:30am
  • 12:15-1:00pm: Lunch break with open discussion
  • 1:00-4:00pm: Advanced topics, with coffee break at ~2:30pm
  • 4:00-4:30pm: Concluding remarks and closing discussion

Course description

In fields ranging from public health and medicine to political science and economics, great care is required to disentangle intricate causal relationships using real-world data and inform decision-making efforts. Causal inference has emerged as a methodological framework for translating substantive questions into well-defined causal estimands, expressing identification assumptions necessary for these to be learned from data, and estimating the resultant quantities via standardization (i.e., outcome regression) and inverse probability weighting. However, such progress has failed to keep pace with developments in machine learning; thus, the practice of causal inference is often marred by over-reliance on restrictive modeling practices. The Targeted Learning (TL) paradigm presents a solution to this problem by unifying aspects of semi-parametric statistical theory, machine learning, and causal inference. The result is a methodological toolbox for evaluating causal effects via state-of-the-art estimators that are both robust (to model misspecification) and efficient (minimal variance, i.e., narrowest possible confidence intervals). This short course introduces the TL paradigm, beginning with the guiding philosophy and underlying scientific motivations and going on to discuss estimation algorithms and their practical implementation through open-source software tools (e.g., the TLverse: https://github.com/tlverse), addressing basic theoretical underpinnings along the way. Specific topics to be covered include targeted maximum likelihood estimation (TMLE) and collaborative TMLE (C-TMLE) for confounder selection (and, time permitting, adaptive TMLE (A-TMLE) for hybrid designs that combine experimental and external data); TMLE algorithms to estimate the causal effects of interventions on binary and continuous exposures; complications for addressing time-varying confounding and/or censoring; and incorporating machine learning via the super learner and highly adaptive lasso algorithms. This short course incorporates a mix of case studies, discussion, and hands-on programming exercises to allow participants to build familiarity with techniques and tools that will translate to improvements in real-world data analytic practice.

Recommended background

This course is primarily intended for biostatisticians, epidemiologists, and applied quantitative scientists. Participants are expected to have had prior training in statistical inference, including such concepts as conditional expectation, confidence intervals, hypothesis testing, regression modeling, and confounding. While some prior knowledge of mathematical statistics may be useful, it is not necessary for success. Prior experience with programming, and with the R language for statistical computing and graphics, will be essential.

Materials

This GitHub repository contains the source materials for a full-day workshop on Targeted Learning, with some applications demonstrated using the tlverse software ecosystem. Some of the teaching materials are adapted from a draft of the forthcoming book Targeted Learning in R: Causal Data Science with the tlverse Software Ecosystem, by Mark van der Laan, Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, and Alan Hubbard; the unabridged book is freely available.

About

Materials for the workshop "Targeted Learning: Bridging Machine Learning with Causal and Statistical Inference" as part of the Harvard Catalyst program in November 2024

Resources

License

Stars

Watchers

Forks