Penalized Spline of Propensity Methods for Missing Data and Causal Inference. Roderick Little

Size: px
Start display at page:

Download "Penalized Spline of Propensity Methods for Missing Data and Causal Inference. Roderick Little"

Transcription

1 Penalized Spline of Propensity Methods for Missing Data and Causal Inference Roderick Little

2 A Tail of Two Statisticians (but who s tailing who) Taylor Little Cambridge U 978 BA Math ( st class) 97 BA Math (2 nd class) Jesus College (Lord Harold Wilson) Gonville & Caius College (Sir Ronald Fisher) 979 Diploma Stat (distinction) 982 MA 975 MA (not) Ph.D. Stat 983 UC Berkeley 974 Imperial College (Jersey Neyman) (Sir David Cox) UCLA 983 Adjunct Prof 983 Assoc Prof Prof in Residence Prof U Michigan 998-present 993-present Taylor conference 27 2

3 Jeremy regularly thrashed me at tennis We both like sports He even beat me up at golf, with his ancient green shield trading stamp golf clubs We also hiked Taylor conference 27 3

4 Mt Whitney June 988 Jeremy and Liza Jeremy camping at 2 ft (clearly affected by the altitude) Taylor conference 27 4

5 Mt Whitney June 988 Whee going down is easy! Following in my footsteps Taylor conference 27 5

6 Mt Whitney June 988 The TOP! (LIKE MY HAT) Taylor conference 27 6

7 Banff 26 Several years and waist sizes later Taylor conference 27 7

8 Research as well T for Too (Long Tails): Lange K, RJA Little and JMG Taylor: Robust statistical modeling using the multivariate T distribution. J Am Stat Assoc 84:88-896, 989. ( + citations) Mutual research interests in missing data, causal inference, Bayesian modeling Taylor conference 27 8

9 We both like our models Little likes his models Simple Taylor likes his models Not too hot Not too cold But just right! Simple Simon The Three Bears Taylor conference 27 9

10 I googled Jeremy s picture, and all I could find was Jeremy Taylor (63-667) The Shakespeare of Clerics Love is friendship set on fire Can any thing in this world be more foolish than to think that all this rare fabric of heaven and earth can come by chance, when all the skill of art is not able to make an oyster Taylor conference 27

11 The Talk Multiple imputation (MI) -- an all-purpose prediction tool MI methods for three examples Missing data penalized spline of propensity prediction (PSPP) Two forms of causal inference penalized spline of propensity for treatment comparisons (PENCOMP) (Zhou, Elliott and Little 27) Bayes and robustness are not mutually exclusive Taylor conference 27

12 Statistics as prediction Statistics is basically about predicting the stuff you don t observe, with appropriate measures of uncertainty See e.g. The need for more emphasis on prediction: a nondenominational model-based approach by David Harville (24). both Bayesian and frequentist thinking are just models in a larger sense that help toward our fundamental goal. Here, I agree whole heartily with the sentiment of the article. Perhaps I would like to go further. Our fundamental goal is always, ultimately, prediction. Rob McCullough, discussion of Harville paper Taylor conference 27 2

13 An all-purpose tool for prediction: multiple imputation (MI) Imputes missing data as draws, not means, from the predictive distribution of the missing values under a model Creates D > filled-in data sets with different values imputed Simple MI combining rules (Rubin, 987) yield valid inferences under well-specified models. Basic form is: V = W +(+/D)B, W = within imputation variance, B = between imputation variance Standard errors reflect imputation uncertainty, and averaging of estimates over MI data sets corrects the loss of efficiency from imputing draws Taylor conference 27 3

14 Examples of MI for missing data Bayes for parametric models, e.g. multivariate normal, general location model (PROC MI) Sequential regression/chained equations MI (IVEware, MICE, STAN) Hot deck multiple imputation (predictive mean matching) PSPP: a more robust regression-based method Taylor conference 27 4

15 Example : univariate missing data Unit, i X R Y 2 r r+ r+2 n X fully observed variables (vector) R response indicator Y variable with missing values Prediction problem: predict the missing values () Here assume missing at random (MAR, Rubin 976): Y indep R X Miami University 27 5

16 Penalized Spline of Propensity Prediction (PSPP) (Little & An 24, Zhang & Little 29). Estimate the propensity to respond given covariates Impute draws from a regression model that includes Penalized spline of estimated propensity to respond Parametric terms on other covariates predictive of Y Exploits the key balancing property of the propensity score (Rosenbaum and Rubin, 983): Conditional on the propensity score and assuming missing at random, all covariates have same distribution for respondents and nonrespondents Hence mispecifying regression on other covariates does not lead to bias Miami University 27 6

17 * ( Y Y, X,..., X p;, ) ~ PSPP method Estimate: Y * =logit (Pr(R= X,,X p )) Impute using the regression model: N( s( Y, ) g( Y, X,..., X ; ), ) * * 2 2 p Nonparametric part Need to be correctly specified We choose penalized spline Parametric part Misspecification does not lead to bias Increases precision X excluded to prevent multicollinearity Miami University 27 7

18 Double Robustness Property The PSPP method yields a consistent estimator for the marginal mean of Y if: (a) the mean of Y given X is correctly specified, or (b) the propensity is correctly specified, and (b2) E Y Y * * (, ) s( Y, ) Key idea: the parametric regression g() on the other covariates does not have to be correctly specified Miami University 27 8

19 Bayesian PSPP Alternative to PSPP MI is PSPP Bayes: Add prior distribution for parameters, and simulate posterior distribution Draws of missing values from MCMC, can be used to make MI proper Both methods (PSPP MI or PSPP Bayes) compare well with alternatives -- inverse probability weighting, simple or augmented -- in simulation studies (Zhang & Little 2, Chen et al. 27) Miami University 27 9

20 Example 2: Basic causal inference Unit, i X Z Y 2 n n + n +2 n +n X baseline covariates/confounders Z treatment indicator (,) Y outcome = treatment effect (difference in means) Assume ignorable treatment assignment mechanism (no unmeasured confounders): Z indep Y X Regression approach: regress Y on Z, X Causal effect is coefficient of Z Where's the prediction here Miami University 27 2

21 Example 2: Basic causal inference: Rubin /Neyman causal model (Rubin, 974) Unit, i X Z Y Y 2 n n + n +2 n +n () () X Z Y Y () () baseline covariates/confounders treatment indicator (,) outcome if given Z outcome if given Z Prediction Problem: predict outcomes () for the treatment not assigned With no unmeasured confounders: regress Y on Z, X () () Multiply impute missing Y, Y with predictions for X, Apply MI combining rules for inference about Miami University 27 2

22 Robust MI for Example 2 Penalized Spline of Propensity for Treatment Comparisons (PENCOMP): (a) Estimate the propensity to be assigned treatments given covariates (b) Apply the PSPP model to multiply impute the potential outcomes for the treatments not assigned to subjects (c) Apply MI combining rules for inference about the average treatment effect This approach has an analogous double robustness property to PSPP for missing data Miami University 27 22

23 PENCOMP MI for Example 2 More specifically: For d =,, D, generate a bootstrap sample from the original data S by sampling units with replacement, stratified on treatment group. Then carry out steps (b)- (d) for each sample d: (b) Estimate a logistic regression for Z given X, with regression parameters g. Estimated propensity of assignment to Z = z is ( d ) Pˆ ( X ) logit Pr( Z z X, ˆ g ), z ( d ) wh ere ˆ g is ML estimate of g Miami University 27 23

24 PENCOMP MI for Example 2 (c) For z =,, using the cases assigned to treatment group z, estimate a normal linear regression of Y on X, with mean E( Y X, Z z,, ) s( Pˆ( X ) ) g( Pˆ( X ), X,..., X ; ) z z z z z 2 p z where s( Pˆ ( X ) ) = penalized spline, z z g() parametric function of predictors (d) For z =,, impute the values of Y Z for subjects in treatment group -z in the original data set with draws from the predictive distribution of Y given X from the regression in (c), with ML estimates substituted for the parameters. (e) Use MI combining rules for inference about average treatment effect Miami University 27 24

25 Example 3: Longitudinal causal inference: confounding by indication Unit, i X Z X Z Y 2 n n + n +2 n =n +n n + n +2 n +n n +n + n +n +2 n=n +n +n 2 2 X Z X Z 2 2 baseline covariates/confounders time treatment indicator ( or ) intermediate outcome time 2 treatment indicator ( or ) Assume ignorable assignment mechanisms Y outcome, 3 treatment effects: mean Y ( Z j, Z k) jk 2 mean Y ( Z Z )( jk,,) 2 Regression doesn't work now: X is an outcome for Z,confounder for Z 2 2 So can't just condition on X 2 Taylor conference 27 25

26 Example 3. Confounding by indication: Rubin causal model solution (Frangakis and Rubin 22) () () () () () () Unit, i X Z X X Z Y Y Y Y 2 n n + n +2 n =n +n n + n +2 n +n n +n + n +n +2 n=n +n +n Multiply impute the missing data ('s) Taylor conference 27 26

27 PENCOMP MI for Example 3 (a) Take Bootstrap sample of the original data. For each bootstrap sample: (b) missing values of the intermediate treatment outcomes X 2 () and X 2 () and are imputed using the method described for Example 2 (c) Conditional on the values of X, Z and the observed or imputed values of X 2, the propensity that Z 2 = given X, Z and X 2 is estimated based on a logistic regression Z 2 on X, Z and X 2 (d) missing values of Y (jk) are then imputed as draws based on the regression Y (jk) on X, Z and X 2 for a model that includes a spline on the propensity from (c); a distinct regression model is fitted for each outcome Y (jk) (e) Apply MI combining rules for inference about average treatment effects Taylor conference 27 27

28 Alternative methods: IPTW and AIPTW Inverse Probability of Treatment Weighting (IPTW) Weight subjects by the inverse of estimate of Pr( Z X) -- in effect creates a pseudo-population that is free of treatment confounders. Consistent if the treatment assignment mechanism is correctly specified. But weights can be highly variable, leading to poor efficiency Augmented Inverse Probability of Treatment Weighting (AIPTW) Doubly robust: consistent if the treatment assignment mechanism is correctly specified, or the prediction model is correctly specified. Miami University Tingting Zhou (Univ. of Michigan) March 24, /

29 Application We applied our method to the Multicenter AIDS Cohort study (MACS) to analyze the effect of antiretroviral treatment on CD4 counts for HIV+. (Kaslow et al, 987) CD4 count is an intermediate outcome of past treatments and confounds the next treatment. Restrict our analyses to the period between visit 6 and 2, when zidovudine was approved and available for use and before the advent of highly active antiretroviral therapy (HAART). We estimate the short-term ( year) effects of using antiretroviral treatment for HIV+ subjects during this period, for each of the three-visit moving windows,, 4. Tingting Zhou (Univ. of Michigan) Taylor conference 27 29

30 Application Taylor conference 27 3 Tingting Zhou (Univ. of Michigan)

31 Summary of Method Comparisons in Simulations When the confounding is low or moderate, the weights are more stable, PENCOMP and AIPTW perform similarly, and are both superior to IPTW. PENCOMP has slightly larger (but still negligible) bias than AIPTW when the prediction model is misspecified and weights are variable. But PENCOMP tends to outperform AIPTW in RMSE, coverage probability and efficiency. Tingting Zhou (Univ. of Michigan) Taylor conference 27 3

32 Conclusion PSPP, PENCOMP regression models that include spline of propensity as predictor Conceptually simple Propensity treated as a covariate rather than a weight Avoids having to address problems with highlyvariable weights Tends to produce more stable estimates Tends to produce good confidence coverage in small samples (Bayes can be particularly useful here) Many Happy Returns Jeremy! Taylor conference 27 32

33 References Chen,Q. et al. (27) Approaches to Improving Survey-Weighted Estimates. To appear in Statistical Science. Elliott, M. R. and Little, R. J. A. (25). Discussion of "on Bayesian Estimation of Marginal Structural Models." Biometrics 7(2), Frangakis, C.E. and Rubin, D.B. (22). Principal stratification in causal inference. Biometrics, 58, Harville, D. (24) The Need for More Emphasis on Prediction: A "Nondenominational" Model-Based Approach. Am. Statist.. 68, 2, 7-83 Kaslow, R. A., Ostrow, D. G., Detels, R., Phair, J. P., Polk, B. F., and Rinaldo, CR. Jr. (987). The Multicenter AIDS Cohort Study: Rationale, Organization, and Selected Characteristics of the Participants. American Journal Epidemiology 26, Little, R. J. A. and An, H. (24). Robust Likelihood-Based Analysis of Multivariate Data with Missing Values. Statistica Sinica 4, Little, R.J.A. & Yau, L. (996). Intent-to-Treat Analysis in Longitudinal Studies with Drop-Outs. Biometrics, 52, Taylor conference 27 33

34 References Ngo, L. and Wand, M. P. (24). Smoothing with Mixed Model Software. Journal of Statistical Software 9, -54. Rosenbaum, P. R. and Rubin, D. B. (983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 7, Rubin, D.B. (974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66, 5, Rubin, D. B. (976). Inference and Missing Data. Biometrika 63, Rubin, D.B. (987). Multiple Imputation for Nonresponse in Surveys: New York: Wiley Zhang, G. & Little, R. J. (29). Extensions of the Penalized Spline of Propensity Prediction Method of Imputation. Biometrics, 65, 3, Zhang, G. & Little, R. J. (2). A Comparative Study of Doubly-Robust Estimators of the Mean with Missing Data. Journal of Statistical Computation and Simulation, 8, 2, Zhou, T. Elliott, M.R. and Little, R.J. (27). Penalized Spline of Propensity Methods for Treatment Comparisons. Under revision for publication. Taylor conference 27 34

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 013 Paper 97 In Praise of Simplicity not Mathematistry! Ten Simple Powerful

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study Theoretical Mathematics & Applications, vol.2, no.3, 2012, 75-86 ISSN: 1792-9687 (print), 1792-9709 (online) Scienpress Ltd, 2012 Strategy of Bayesian Propensity Score Estimation Approach in Observational

More information

DATA-ADAPTIVE VARIABLE SELECTION FOR

DATA-ADAPTIVE VARIABLE SELECTION FOR DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014 Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0

More information

arxiv: v5 [stat.me] 13 Feb 2018

arxiv: v5 [stat.me] 13 Feb 2018 arxiv: arxiv:1602.07933 BOOTSTRAP INFERENCE WHEN USING MULTIPLE IMPUTATION By Michael Schomaker and Christian Heumann University of Cape Town and Ludwig-Maximilians Universität München arxiv:1602.07933v5

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials

Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials UW Biostatistics Working Paper Series 2-19-2009 Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials Leslie Taylor UW, taylorl@u.washington.edu Xiao-Hua

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Summary and discussion of The central role of the propensity score in observational studies for causal effects Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates Eun Sook Kim, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta E. Lanehart, Aarti Bellara,

More information

Marginal Structural Models and Causal Inference in Epidemiology

Marginal Structural Models and Causal Inference in Epidemiology Marginal Structural Models and Causal Inference in Epidemiology James M. Robins, 1,2 Miguel Ángel Hernán, 1 and Babette Brumback 2 In observational studies with exposures or treatments that vary over time,

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group A comparison of weighted estimators for the population mean Ye Yang Weighting in surveys group Motivation Survey sample in which auxiliary variables are known for the population and an outcome variable

More information

Accounting for Complex Sample Designs via Mixture Models

Accounting for Complex Sample Designs via Mixture Models Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3

More information

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2010 Paper 117 Estimating Causal Effects in Trials Involving Multi-treatment Arms Subject to Non-compliance: A Bayesian Frame-work

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

STATISTICAL ANALYSIS WITH MISSING DATA

STATISTICAL ANALYSIS WITH MISSING DATA STATISTICAL ANALYSIS WITH MISSING DATA SECOND EDITION Roderick J.A. Little & Donald B. Rubin WILEY SERIES IN PROBABILITY AND STATISTICS Statistical Analysis with Missing Data Second Edition WILEY SERIES

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

Interactions and Squares: Don t Transform, Just Impute!

Interactions and Squares: Don t Transform, Just Impute! Interactions and Squares: Don t Transform, Just Impute! Philipp Gaffert Volker Bosch Florian Meinfelder Abstract Multiple imputation [Rubin, 1987] is difficult to conduct if the analysis model includes

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Advising on Research Methods: A consultant's companion Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Contents Preface 13 I Preliminaries 19 1 Giving advice on research methods

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Modeling Mediation: Causes, Markers, and Mechanisms

Modeling Mediation: Causes, Markers, and Mechanisms Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Pooling multiple imputations when the sample happens to be the population.

Pooling multiple imputations when the sample happens to be the population. Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be This is the submitted version of the following book chapter: stat08068: Double robustness, which will be published in its final form in Wiley StatsRef: Statistics Reference Online (http://onlinelibrary.wiley.com/book/10.1002/9781118445112)

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Inferences on missing information under multiple imputation and two-stage multiple imputation

Inferences on missing information under multiple imputation and two-stage multiple imputation p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Modeling Log Data from an Intelligent Tutor Experiment

Modeling Log Data from an Intelligent Tutor Experiment Modeling Log Data from an Intelligent Tutor Experiment Adam Sales 1 joint work with John Pane & Asa Wilks College of Education University of Texas, Austin RAND Corporation Pittsburgh, PA & Santa Monica,

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04 Fractional polynomials,

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Propensity Score Adjustment for Unmeasured Confounding in Observational Studies

Propensity Score Adjustment for Unmeasured Confounding in Observational Studies Propensity Score Adjustment for Unmeasured Confounding in Observational Studies Lawrence C. McCandless Sylvia Richardson Nicky G. Best Department of Epidemiology and Public Health, Imperial College London,

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

Gov 2002: 5. Matching

Gov 2002: 5. Matching Gov 2002: 5. Matching Matthew Blackwell October 1, 2015 Where are we? Where are we going? Discussed randomized experiments, started talking about observational data. Last week: no unmeasured confounders

More information

Propensity Score Matching

Propensity Score Matching Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information