Targeted Maximum Likelihood Estimation in Safety Analysis

Size: px
Start display at page:

Download "Targeted Maximum Likelihood Estimation in Safety Analysis"

Transcription

1 Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August / 35

2 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 2 / 35

3 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 3 / 35

4 Traditional approach in epidemiology and clinical medicine Fit several parametric logistic regression models, and select a favorite one. Report point estimate of coefficient in front of treatment, confidence intervals, and p-value, as if this parametric model was a priori-specified. Problems Parametric model is misspecified, but parameter estimates are interpreted as if the model is correct Estimates of variance do not account for model selection, so confidence intervals and p-values are wrong, even if the final model is somehow correct! 4 / 35

5 The statistical estimation problem Observed data: Realizations of random variables with a probability distribution. Statistical model: Set of possible distributions for the data-generating distribution, defined by actual knowledge about the data. e.g. in an RCT, we know the probability of each subject receiving treatment. Statistical target parameter: Function of the data-generating distribution that we wish to learn from the data. Estimator: An a priori-specified algorithm that takes the observed data and returns an estimate of the target parameter. Benchmarked by a dissimilarity-measure (e.g., MSE) w.r.t target parameter. 5 / 35

6 Causal inference Non-testable assumptions in addition to the assumptions defining the statistical model. (e.g. the no unmeasured confounders assumption). Allows for causal interpretation of statistical parameter estimates Even if we don t believe the non-testable causal assumptions, the statistical estimation problem is still the same, and estimates still have valid statistical interpretations. 6 / 35

7 Targeted learning Define true statistical models, and interesting target parameters Avoid reliance on human art and nonrealistic parametric models Target the fit of the data-generating distribution to the parameter of interest Statistical inference Has been applied to: static or dynamic treatments, direct and indirect effects, parameters of MSMs, variable importance analysis, longitudinal/repeated measures data with time-dependent confounding, censoring/missingness, case-control studies, RCTs 7 / 35

8 Two stage estimation methodology Super learning (SL) (van der Laan et al. 2007) Uses a library of candidate estimators (e.g. multiple parametric models, machine learning algorithms like neural networks, RandomForest, etc.) Builds data-adaptive weighted combination of estimators using cross validation Targeted maximum likelihood estimation (TMLE) (van der Laan and Rubin 2006) Updates initial estimate, often a Super Learner, to remove bias for the parameter of interest Calculates final parameter from updated fit of the data-generating distribution 8 / 35

9 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 9 / 35

10 Super learning No need to chose a priori a particular parametric model or machine learning algorithm for a particular problem Allows one to combine many data-adaptive estimators into one improved estimator. Grounded by oracle results for loss-function based cross-validation (Van Der Laan and Dudoit 2003). Loss function needs to be bounded. Performs asymptotically as well as best (oracle) weighted combination, or achieves parametric rate of convergence. 10 / 35

11 Super learning Figure: Relative Cross-Validated Mean Squared Error (compared to main terms least squares regression) 11 / 35

12 Super learning 12 / 35

13 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 13 / 35

14 TMLE algorithm 14 / 35

15 Targeted MLE 1 Identify the least favorable parametric model for fluctuating initial ˆP Small fluctuation maximum change in target. 2 Identify optimum amount of fluctuation by MLE. 3 Apply optimal fluctuation to ˆP 1st-step targeted maximum likelihood estimator. 4 Repeat until the incremental fluctuation" is zero Some important cases: 1 step to convergence. 5 Final probability distribution solves efficient score equation for target parameter T-MLE is a double robust & locally efficient plug-in estimator 15 / 35

16 Collaborative TMLE (CTMLE) algorithm Like TMLE, but chooses an estimate ĝ of the treatment mechanism/propensity score based on how well it helps estimate Ψ(Q 0 ) instead of how well it estimates the true g 0. Build estimate for g 0 in a stepwise fashion Strongest confounders are adjusted for first Instrumental variables and weak confounders tend to be excluded Order of terms added to ĝ is chosen via a penalized log likelihood, and number of terms is chosen via cross-validation 16 / 35

17 Kang and Schafer (2007) simulations Outcome Y continuous subject to missingness, and 4 covariates, W 1, W 2, W 3, W 4 True population mean (target parameter) is 210, mean among the non-missing is 200. Positivity violations g 0 ( = 1 W ) as small as 0.01 Modification 1: stronger positivity violations, g 0 ( = 1 W ) as small as Modification 2: same as 1, but one covariate is no longer affects Y, so it is an instrumental variable. 17 / 35

18 Kang and Schafer (2007) simulations Kang and Schafer Simulation OLS WLS A IPCW TMLE C TMLE / 35

19 Kang and Schafer (2007) simulations Modification 1 to Kang and Schafer Simulation OLS WLS A IPCW TMLE C TMLE 19 / 35

20 Kang and Schafer (2007) simulations Modification 2 to Kang and Schafer Simulation OLS WLS A IPCW TMLE C TMLE 20 / 35

21 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 21 / 35

22 Description of dataset A subset of data from Kaiser Permanente, part of which is used in FDA s Mini-Sentinel drug safety surveillance. Population: diabetic patients without prior cardiovascular disease who are new users of pioglitazone or a sulfonylurea (two anti-diabetic drugs) and who are followed up for at least 6 months without also starting the other drug. 1 Treatment arm (in this example): pioglitazone (Treatment variable A = 1) Comparator: sulfonylurea (A = 0) Outcome (Y ): acute myocardial infarction (AMI) in first 6 months of new anti-diabetic drug use. Baseline covariates (W ): fifty covariates including demographics, comorbidities, and other drug use. 1 We found that adjusting for missing outcomes had no effect on the results in this case so we suppress those results and ignore missingness in this example. 22 / 35

23 Causal model, counterfactual outcomes, and parameter of interest Non-parametric structural equation model: Each variable is an unknown deterministic function of the past and an error. W = f W (U W ) A = f A (W, U A ) Y = f Y (A, W, U Y ) Counterfactual outcomes: substitute a fixed treatment for A in f Y : Y a = f Y (W, a, U Y ) for a {0, 1}. Causal parameter of interest: The average treatment effect (ATE). E(Y 1 Y 0 ) Statistical parameter of interest: Ψ(P 0 ) = E[E(Y A = 1, W ) E(Y A = 0, W )] equals E(Y 1 Y 0 ) under randomization assumption ( no unmeasured confounders ) and positivity assumption 23 / 35

24 Analysis results Summary of outcome by treatment Estimates Treatment Comparator Total Total AMI 5 (0.233%) 86 (0.3437%) 91 (0.335%) Estimate p-value Unadjusted G-comp PS matching IPTW AIPTW TMLE Though sample size is large, there are so few AMIs in this subset of data from Kaiser Permanente that it is hard to tell if adjustment for potential confounders is important. 24 / 35

25 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 25 / 35

26 Strategy Simulate datasets based on real study data where the true effect is known to highlight properties of estimators. Start with KP data set, including additional new users of three other anti-diabetic drugs. Sample W with replacement from empirical distribution of baseline covariates Simulate treatment A assignments based on a known function of baseline covariates Simulate outcome Y based on a function of W adjusted so that Y is not too rare. Because the Y is simulated based on a function of only baseline covariates and not the treatment, the true average treatment effect is known to be zero. 26 / 35

27 Simulation 1 Treatment mechanism a function of 12 covariates strongly predictive of the outcome. Outcome and propensity score models known and can be correctly specified. Outcome and propensity score models are misspecified by leaving out half of the important confounders. Results demonstrate the double-robustness of TMLE and AIPTW: when either the model for the outcome regression or the PS is specified correctly, the parameter estimate is consistent, which is not the case for the G-computation estimator or IPTW. 27 / 35

28 Simulation 1 Estimator Bias MSE n=1000 n=5000 n=1000 n=5000 Unadjusted G-comp PSM IPTW AIPTW TMLE G-comp, misspecified PSM, misspecified IPTW, misspecified AIPTW, Outcome misspecified AIPTW, PS misspecified TMLE, Outcome misspecified TMLE, PS misspecified / 35

29 Simulation 2 Treatment mechanism now depends on a covariate that is very predictive of treatment, resulting in positivity violations, but is not a confounder. Results illustrate that IPTW has much higher variance than other estimators, particularly in small samples, and that CTMLE is very robust to violations of the positivity assumption, particularly in small samples. 29 / 35

30 Simulation 2 Estimator Bias MSE n=100 n=500 n=100 n=500 Unadjusted G-comp PSM IPTW AIPTW TMLE CTMLE Some estimates are out of the parameter space (> 1) due to very large weights, resulting in the high variance. 30 / 35

31 Simulation 3 Treatment mechanism depends on the interactions between binary covariates. Main terms logistic regression for the PS is not sufficient to account for all confounding. Results demonstrate that data adaptive SuperLearning is necessary to estimate the PS well enough to adjust for confounding. 31 / 35

32 Simulation 3 Estimator Bias MSE n=1000 n=5000 n=1000 n=5000 Unadjusted PSM, PS main terms only IPTW, PS main terms only AIPTW, PS main terms only TMLE, PS main terms only PSM, PS SuperLearner IPTW, PS SuperLearner AIPTW, PS SuperLearner TMLE, PS SuperLearner Here the outcome regression in TMLE and AIPTW is unadjusted to emphasize the benefits of SuperLearning for the PS. 32 / 35

33 Further Materials Targeted Learning Book Springer Series in Statistics van der laan & Rose targetedlearningbook.com 33 / 35

34 References I J. Kang and J. Schafer. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4): , M. Van Der Laan and S. Dudoit. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. UC Berkeley Division of Biostatistics Working Paper Series, page 130, M. J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York, ISBN / 35

35 References II M. J. van der Laan and D. Rubin. Targeted Maximum Likelihood Learning. The International Journal of Biostatistics, 2(1), Jan ISSN doi: / M. J. van der Laan, E. C. Polley, and A. E. Hubbard. Super learner. Statistical applications in genetics and molecular biology, 6(1), Jan ISSN doi: / / 35

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

Targeted Learning for High-Dimensional Variable Importance

Targeted Learning for High-Dimensional Variable Importance Targeted Learning for High-Dimensional Variable Importance Alan Hubbard, Nima Hejazi, Wilson Cai, Anna Decker Division of Biostatistics University of California, Berkeley July 27, 2016 for Centre de Recherches

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 259 Targeted Maximum Likelihood Based Causal Inference Mark J. van der Laan University of

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 282 Super Learner Based Conditional Density Estimation with Application to Marginal Structural

More information

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber Collaborative Targeted Maximum Likelihood Estimation by Susan Gruber A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics in the

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 290 Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome Mark

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Mark J. van der Laan 1 University of California, Berkeley School of Public Health laan@berkeley.edu

More information

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2016 Paper 352 Scalable Collaborative Targeted Learning for High-dimensional Data Cheng Ju Susan Gruber

More information

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation

More information

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research Chapter 4: Efficient, doubly-robust estimation of an average treatment effect David Benkeser

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 327 Entering the Era of Data Science: Targeted Learning and the Integration of Statistics

More information

Targeted Group Sequential Adaptive Designs

Targeted Group Sequential Adaptive Designs Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 248 Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Kelly

More information

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials From the SelectedWorks of Paul H. Chaffee June 22, 2012 Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials Paul Chaffee Mark J. van der Laan

More information

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Antoine Chambaz (MAP5, Université Paris Descartes) joint work with Mark van der Laan Atelier INSERM

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

DATA-ADAPTIVE VARIABLE SELECTION FOR

DATA-ADAPTIVE VARIABLE SELECTION FOR DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure arxiv:1706.02675v2 [stat.me] 2 Apr 2018 Laura B. Balzer, Wenjing Zheng,

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 341 The Statistics of Sensitivity Analyses Alexander R. Luedtke Ivan Diaz Mark J. van der

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Distributed analysis in multi-center studies

Distributed analysis in multi-center studies Distributed analysis in multi-center studies Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy,

More information

Adaptive Trial Designs

Adaptive Trial Designs Adaptive Trial Designs Wenjing Zheng, Ph.D. Methods Core Seminar Center for AIDS Prevention Studies University of California, San Francisco Nov. 17 th, 2015 Trial Design! Ethical:!eg.! Safety!! Efficacy!

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules

Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules University of California, Berkeley From the SelectedWorks of Maya Petersen March, 2007 Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules Mark J van der Laan, University

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 269 Diagnosing and Responding to Violations in the Positivity Assumption Maya L. Petersen

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be This is the submitted version of the following book chapter: stat08068: Double robustness, which will be published in its final form in Wiley StatsRef: Statistics Reference Online (http://onlinelibrary.wiley.com/book/10.1002/9781118445112)

More information

Targeted Minimum Loss Based Estimation for Longitudinal Data. Paul H. Chaffee. A dissertation submitted in partial satisfaction of the

Targeted Minimum Loss Based Estimation for Longitudinal Data. Paul H. Chaffee. A dissertation submitted in partial satisfaction of the Targeted Minimum Loss Based Estimation for Longitudinal Data by Paul H. Chaffee A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology

Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology by Catherine Ann Tuglus A dissertation submitted in partial satisfaction

More information

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited

More information

Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model

Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model Johns Hopkins Bloomberg School of Public Health From the SelectedWorks of Michael Rosenblum 2010 Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model Michael Rosenblum,

More information

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm Richard Wyss 1, Bruce Fireman 2, Jeremy A. Rassen 3, Sebastian Schneeweiss 1 Author Affiliations:

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 330 Online Targeted Learning Mark J. van der Laan Samuel D. Lendle Division of Biostatistics,

More information

Statistical Inference for Data Adaptive Target Parameters

Statistical Inference for Data Adaptive Target Parameters Statistical Inference for Data Adaptive Target Parameters Mark van der Laan, Alan Hubbard Division of Biostatistics, UC Berkeley December 13, 2013 Mark van der Laan, Alan Hubbard ( Division of Biostatistics,

More information

Causal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of

Causal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of Causal Inference for Case-Control Studies By Sherri Rose A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics in the Graduate Division

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 155 Estimation of Direct and Indirect Causal Effects in Longitudinal Studies Mark J. van

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 252 Targeted Maximum Likelihood Estimation: A Gentle Introduction Susan Gruber Mark J. van

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

PEARL VS RUBIN (GELMAN)

PEARL VS RUBIN (GELMAN) PEARL VS RUBIN (GELMAN) AN EPIC battle between the Rubin Causal Model school (Gelman et al) AND the Structural Causal Model school (Pearl et al) a cursory overview Dokyun Lee WHO ARE THEY? Judea Pearl

More information

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014 Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

1 Basic summary of article to set stage for discussion

1 Basic summary of article to set stage for discussion Epidemiol. Methods 214; 3(1): 21 31 Discussion Mark J. van der Laan*, Alexander R. Luedtke and Iván Díaz Discussion of Identification, Estimation and Approximation of Risk under Interventions that Depend

More information

Modern Statistical Learning Methods for Observational Biomedical Data. Chapter 2: Basic identification and estimation of an average treatment effect

Modern Statistical Learning Methods for Observational Biomedical Data. Chapter 2: Basic identification and estimation of an average treatment effect Modern Statistical Learning Methods for Observational Biomedical Data Chapter 2: Basic identification and estimation of an average treatment effect David Benkeser Emory Univ. Marco Carone Univ. of Washington

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Targeted Learning with Daily EHR Data

Targeted Learning with Daily EHR Data Targeted Learning with Daily EHR Data Oleg Sofrygin 1,2, Zheng Zhu 1, Julie A Schmittdiel 1, Alyce S. Adams 1, Richard W. Grant 1, Mark J. van der Laan 2, and Romain Neugebauer 1 arxiv:1705.09874v1 [stat.ap]

More information

Doubly Robust Estimation in Missing Data and Causal Inference Models

Doubly Robust Estimation in Missing Data and Causal Inference Models Biometrics 61, 962 972 December 2005 DOI: 10.1111/j.1541-0420.2005.00377.x Doubly Robust Estimation in Missing Data and Causal Inference Models Heejung Bang Division of Biostatistics and Epidemiology,

More information

Estimating the Effect of Vigorous Physical Activity on Mortality in the Elderly Based on Realistic Individualized Treatment and Intentionto-Treat

Estimating the Effect of Vigorous Physical Activity on Mortality in the Elderly Based on Realistic Individualized Treatment and Intentionto-Treat University of California, Berkeley From the SelectedWorks of Oliver Bembom May, 2007 Estimating the Effect of Vigorous Physical Activity on Mortality in the Elderly Based on Realistic Individualized Treatment

More information

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE: #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Big Data, Causal Modeling, and Estimation

Big Data, Causal Modeling, and Estimation Big Data, Causal Modeling, and Estimation The Center for Interdisciplinary Studies in Security and Privacy Summer Workshop Sherri Rose NSF Mathematical Sciences Postdoctoral Research Fellow Department

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 250 A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk

More information

Robust Estimation of Inverse Probability Weights for Marginal Structural Models

Robust Estimation of Inverse Probability Weights for Marginal Structural Models Robust Estimation of Inverse Probability Weights for Marginal Structural Models Kosuke IMAI and Marc RATKOVIC Marginal structural models (MSMs) are becoming increasingly popular as a tool for causal inference

More information

arxiv: v1 [stat.me] 5 Apr 2017

arxiv: v1 [stat.me] 5 Apr 2017 Doubly Robust Inference for Targeted Minimum Loss Based Estimation in Randomized Trials with Missing Outcome Data arxiv:1704.01538v1 [stat.me] 5 Apr 2017 Iván Díaz 1 and Mark J. van der Laan 2 1 Division

More information

Extending the results of clinical trials using data from a target population

Extending the results of clinical trials using data from a target population Extending the results of clinical trials using data from a target population Issa Dahabreh Center for Evidence-Based Medicine, Brown School of Public Health Disclaimer Partly supported through PCORI Methods

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

diluted treatment effect estimation for trigger analysis in online controlled experiments

diluted treatment effect estimation for trigger analysis in online controlled experiments diluted treatment effect estimation for trigger analysis in online controlled experiments Alex Deng and Victor Hu February 2, 2015 Microsoft outline Trigger Analysis and The Dilution Problem Traditional

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai Chapter 1 Propensity Score Analysis Concepts and Issues Wei Pan Haiyan Bai Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity score analysis, research using propensity score analysis

More information

Instrumental variables estimation in the Cox Proportional Hazard regression model

Instrumental variables estimation in the Cox Proportional Hazard regression model Instrumental variables estimation in the Cox Proportional Hazard regression model James O Malley, Ph.D. Department of Biomedical Data Science The Dartmouth Institute for Health Policy and Clinical Practice

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Targeted Learning. Sherri Rose. April 24, Associate Professor Department of Health Care Policy Harvard Medical School

Targeted Learning. Sherri Rose. April 24, Associate Professor Department of Health Care Policy Harvard Medical School Targeted Learning Sherri Rose Associate Professor Department of Health Care Policy Harvard Medical School Slides: drsherrirosecom/short-courses Code: githubcom/sherrirose/cncshortcourse April 24, 2017

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

Causal inference in epidemiological practice

Causal inference in epidemiological practice Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2 Overview Introduction to causal inference Marginal causal effects Estimating marginal

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

Gov 2002: 5. Matching

Gov 2002: 5. Matching Gov 2002: 5. Matching Matthew Blackwell October 1, 2015 Where are we? Where are we going? Discussed randomized experiments, started talking about observational data. Last week: no unmeasured confounders

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2005 Paper 191 Population Intervention Models in Causal Inference Alan E. Hubbard Mark J. van der Laan

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

Diagnosing and responding to violations in the positivity assumption.

Diagnosing and responding to violations in the positivity assumption. University of California, Berkeley From the SelectedWorks of Maya Petersen 2012 Diagnosing and responding to violations in the positivity assumption. Maya Petersen, University of California, Berkeley K

More information

Semi-Parametric Estimation in Network Data and Tools for Conducting Complex Simulation Studies in Causal Inference.

Semi-Parametric Estimation in Network Data and Tools for Conducting Complex Simulation Studies in Causal Inference. Semi-Parametric Estimation in Network Data and Tools for Conducting Complex Simulation Studies in Causal Inference by Oleg A Sofrygin A dissertation submitted in partial satisfaction of the requirements

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach

Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Global for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Daniel Aidan McDermott Ivan Diaz Johns Hopkins University Ibrahim Turkoz Janssen Research and Development September

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information