Penalized Spline of Propensity Methods for Missing Data and Causal Inference. Roderick Little

Size: px

Start display at page:

Download "Penalized Spline of Propensity Methods for Missing Data and Causal Inference. Roderick Little"

Jesse Rich
5 years ago
Views:

1 Penalized Spline of Propensity Methods for Missing Data and Causal Inference Roderick Little

2 A Tail of Two Statisticians (but who s tailing who) Taylor Little Cambridge U 978 BA Math ( st class) 97 BA Math (2 nd class) Jesus College (Lord Harold Wilson) Gonville & Caius College (Sir Ronald Fisher) 979 Diploma Stat (distinction) 982 MA 975 MA (not) Ph.D. Stat 983 UC Berkeley 974 Imperial College (Jersey Neyman) (Sir David Cox) UCLA 983 Adjunct Prof 983 Assoc Prof Prof in Residence Prof U Michigan 998-present 993-present Taylor conference 27 2

3 Jeremy regularly thrashed me at tennis We both like sports He even beat me up at golf, with his ancient green shield trading stamp golf clubs We also hiked Taylor conference 27 3

4 Mt Whitney June 988 Jeremy and Liza Jeremy camping at 2 ft (clearly affected by the altitude) Taylor conference 27 4

5 Mt Whitney June 988 Whee going down is easy! Following in my footsteps Taylor conference 27 5

6 Mt Whitney June 988 The TOP! (LIKE MY HAT) Taylor conference 27 6

7 Banff 26 Several years and waist sizes later Taylor conference 27 7

8 Research as well T for Too (Long Tails): Lange K, RJA Little and JMG Taylor: Robust statistical modeling using the multivariate T distribution. J Am Stat Assoc 84:88-896, 989. ( + citations) Mutual research interests in missing data, causal inference, Bayesian modeling Taylor conference 27 8

9 We both like our models Little likes his models Simple Taylor likes his models Not too hot Not too cold But just right! Simple Simon The Three Bears Taylor conference 27 9

I googled Jeremy s picture, and all I could find was Jeremy Taylor (63-667) The Shakespeare of Clerics Love is friendship set on fire Can any thing in this world be

10 I googled Jeremy s picture, and all I could find was Jeremy Taylor (63-667) The Shakespeare of Clerics Love is friendship set on fire Can any thing in this world be more foolish than to think that all this rare fabric of heaven and earth can come by chance, when all the skill of art is not able to make an oyster Taylor conference 27

11 The Talk Multiple imputation (MI) -- an all-purpose prediction tool MI methods for three examples Missing data penalized spline of propensity prediction (PSPP) Two forms of causal inference penalized spline of propensity for treatment comparisons (PENCOMP) (Zhou, Elliott and Little 27) Bayes and robustness are not mutually exclusive Taylor conference 27

12 Statistics as prediction Statistics is basically about predicting the stuff you don t observe, with appropriate measures of uncertainty See e.g. The need for more emphasis on prediction: a nondenominational model-based approach by David Harville (24). both Bayesian and frequentist thinking are just models in a larger sense that help toward our fundamental goal. Here, I agree whole heartily with the sentiment of the article. Perhaps I would like to go further. Our fundamental goal is always, ultimately, prediction. Rob McCullough, discussion of Harville paper Taylor conference 27 2

13 An all-purpose tool for prediction: multiple imputation (MI) Imputes missing data as draws, not means, from the predictive distribution of the missing values under a model Creates D > filled-in data sets with different values imputed Simple MI combining rules (Rubin, 987) yield valid inferences under well-specified models. Basic form is: V = W +(+/D)B, W = within imputation variance, B = between imputation variance Standard errors reflect imputation uncertainty, and averaging of estimates over MI data sets corrects the loss of efficiency from imputing draws Taylor conference 27 3

14 Examples of MI for missing data Bayes for parametric models, e.g. multivariate normal, general location model (PROC MI) Sequential regression/chained equations MI (IVEware, MICE, STAN) Hot deck multiple imputation (predictive mean matching) PSPP: a more robust regression-based method Taylor conference 27 4

15 Example : univariate missing data Unit, i X R Y 2 r r+ r+2 n X fully observed variables (vector) R response indicator Y variable with missing values Prediction problem: predict the missing values () Here assume missing at random (MAR, Rubin 976): Y indep R X Miami University 27 5

16 Penalized Spline of Propensity Prediction (PSPP) (Little & An 24, Zhang & Little 29). Estimate the propensity to respond given covariates Impute draws from a regression model that includes Penalized spline of estimated propensity to respond Parametric terms on other covariates predictive of Y Exploits the key balancing property of the propensity score (Rosenbaum and Rubin, 983): Conditional on the propensity score and assuming missing at random, all covariates have same distribution for respondents and nonrespondents Hence mispecifying regression on other covariates does not lead to bias Miami University 27 6

17 * ( Y Y, X,..., X p;, ) ~ PSPP method Estimate: Y * =logit (Pr(R= X,,X p )) Impute using the regression model: N( s( Y, ) g( Y, X,..., X ; ), ) * * 2 2 p Nonparametric part Need to be correctly specified We choose penalized spline Parametric part Misspecification does not lead to bias Increases precision X excluded to prevent multicollinearity Miami University 27 7

18 Double Robustness Property The PSPP method yields a consistent estimator for the marginal mean of Y if: (a) the mean of Y given X is correctly specified, or (b) the propensity is correctly specified, and (b2) E Y Y * * (, ) s( Y, ) Key idea: the parametric regression g() on the other covariates does not have to be correctly specified Miami University 27 8

19 Bayesian PSPP Alternative to PSPP MI is PSPP Bayes: Add prior distribution for parameters, and simulate posterior distribution Draws of missing values from MCMC, can be used to make MI proper Both methods (PSPP MI or PSPP Bayes) compare well with alternatives -- inverse probability weighting, simple or augmented -- in simulation studies (Zhang & Little 2, Chen et al. 27) Miami University 27 9

20 Example 2: Basic causal inference Unit, i X Z Y 2 n n + n +2 n +n X baseline covariates/confounders Z treatment indicator (,) Y outcome = treatment effect (difference in means) Assume ignorable treatment assignment mechanism (no unmeasured confounders): Z indep Y X Regression approach: regress Y on Z, X Causal effect is coefficient of Z Where's the prediction here Miami University 27 2

21 Example 2: Basic causal inference: Rubin /Neyman causal model (Rubin, 974) Unit, i X Z Y Y 2 n n + n +2 n +n () () X Z Y Y () () baseline covariates/confounders treatment indicator (,) outcome if given Z outcome if given Z Prediction Problem: predict outcomes () for the treatment not assigned With no unmeasured confounders: regress Y on Z, X () () Multiply impute missing Y, Y with predictions for X, Apply MI combining rules for inference about Miami University 27 2

22 Robust MI for Example 2 Penalized Spline of Propensity for Treatment Comparisons (PENCOMP): (a) Estimate the propensity to be assigned treatments given covariates (b) Apply the PSPP model to multiply impute the potential outcomes for the treatments not assigned to subjects (c) Apply MI combining rules for inference about the average treatment effect This approach has an analogous double robustness property to PSPP for missing data Miami University 27 22

23 PENCOMP MI for Example 2 More specifically: For d =,, D, generate a bootstrap sample from the original data S by sampling units with replacement, stratified on treatment group. Then carry out steps (b)- (d) for each sample d: (b) Estimate a logistic regression for Z given X, with regression parameters g. Estimated propensity of assignment to Z = z is ( d ) Pˆ ( X ) logit Pr( Z z X, ˆ g ), z ( d ) wh ere ˆ g is ML estimate of g Miami University 27 23

24 PENCOMP MI for Example 2 (c) For z =,, using the cases assigned to treatment group z, estimate a normal linear regression of Y on X, with mean E( Y X, Z z,, ) s( Pˆ( X ) ) g( Pˆ( X ), X,..., X ; ) z z z z z 2 p z where s( Pˆ ( X ) ) = penalized spline, z z g() parametric function of predictors (d) For z =,, impute the values of Y Z for subjects in treatment group -z in the original data set with draws from the predictive distribution of Y given X from the regression in (c), with ML estimates substituted for the parameters. (e) Use MI combining rules for inference about average treatment effect Miami University 27 24

25 Example 3: Longitudinal causal inference: confounding by indication Unit, i X Z X Z Y 2 n n + n +2 n =n +n n + n +2 n +n n +n + n +n +2 n=n +n +n 2 2 X Z X Z 2 2 baseline covariates/confounders time treatment indicator ( or ) intermediate outcome time 2 treatment indicator ( or ) Assume ignorable assignment mechanisms Y outcome, 3 treatment effects: mean Y ( Z j, Z k) jk 2 mean Y ( Z Z )( jk,,) 2 Regression doesn't work now: X is an outcome for Z,confounder for Z 2 2 So can't just condition on X 2 Taylor conference 27 25

26 Example 3. Confounding by indication: Rubin causal model solution (Frangakis and Rubin 22) () () () () () () Unit, i X Z X X Z Y Y Y Y 2 n n + n +2 n =n +n n + n +2 n +n n +n + n +n +2 n=n +n +n Multiply impute the missing data ('s) Taylor conference 27 26

27 PENCOMP MI for Example 3 (a) Take Bootstrap sample of the original data. For each bootstrap sample: (b) missing values of the intermediate treatment outcomes X 2 () and X 2 () and are imputed using the method described for Example 2 (c) Conditional on the values of X, Z and the observed or imputed values of X 2, the propensity that Z 2 = given X, Z and X 2 is estimated based on a logistic regression Z 2 on X, Z and X 2 (d) missing values of Y (jk) are then imputed as draws based on the regression Y (jk) on X, Z and X 2 for a model that includes a spline on the propensity from (c); a distinct regression model is fitted for each outcome Y (jk) (e) Apply MI combining rules for inference about average treatment effects Taylor conference 27 27

28 Alternative methods: IPTW and AIPTW Inverse Probability of Treatment Weighting (IPTW) Weight subjects by the inverse of estimate of Pr( Z X) -- in effect creates a pseudo-population that is free of treatment confounders. Consistent if the treatment assignment mechanism is correctly specified. But weights can be highly variable, leading to poor efficiency Augmented Inverse Probability of Treatment Weighting (AIPTW) Doubly robust: consistent if the treatment assignment mechanism is correctly specified, or the prediction model is correctly specified. Miami University Tingting Zhou (Univ. of Michigan) March 24, /

29 Application We applied our method to the Multicenter AIDS Cohort study (MACS) to analyze the effect of antiretroviral treatment on CD4 counts for HIV+. (Kaslow et al, 987) CD4 count is an intermediate outcome of past treatments and confounds the next treatment. Restrict our analyses to the period between visit 6 and 2, when zidovudine was approved and available for use and before the advent of highly active antiretroviral therapy (HAART). We estimate the short-term ( year) effects of using antiretroviral treatment for HIV+ subjects during this period, for each of the three-visit moving windows,, 4. Tingting Zhou (Univ. of Michigan) Taylor conference 27 29

30 Application Taylor conference 27 3 Tingting Zhou (Univ. of Michigan)

31 Summary of Method Comparisons in Simulations When the confounding is low or moderate, the weights are more stable, PENCOMP and AIPTW perform similarly, and are both superior to IPTW. PENCOMP has slightly larger (but still negligible) bias than AIPTW when the prediction model is misspecified and weights are variable. But PENCOMP tends to outperform AIPTW in RMSE, coverage probability and efficiency. Tingting Zhou (Univ. of Michigan) Taylor conference 27 3

32 Conclusion PSPP, PENCOMP regression models that include spline of propensity as predictor Conceptually simple Propensity treated as a covariate rather than a weight Avoids having to address problems with highlyvariable weights Tends to produce more stable estimates Tends to produce good confidence coverage in small samples (Bayes can be particularly useful here) Many Happy Returns Jeremy! Taylor conference 27 32

33 References Chen,Q. et al. (27) Approaches to Improving Survey-Weighted Estimates. To appear in Statistical Science. Elliott, M. R. and Little, R. J. A. (25). Discussion of "on Bayesian Estimation of Marginal Structural Models." Biometrics 7(2), Frangakis, C.E. and Rubin, D.B. (22). Principal stratification in causal inference. Biometrics, 58, Harville, D. (24) The Need for More Emphasis on Prediction: A "Nondenominational" Model-Based Approach. Am. Statist.. 68, 2, 7-83 Kaslow, R. A., Ostrow, D. G., Detels, R., Phair, J. P., Polk, B. F., and Rinaldo, CR. Jr. (987). The Multicenter AIDS Cohort Study: Rationale, Organization, and Selected Characteristics of the Participants. American Journal Epidemiology 26, Little, R. J. A. and An, H. (24). Robust Likelihood-Based Analysis of Multivariate Data with Missing Values. Statistica Sinica 4, Little, R.J.A. & Yau, L. (996). Intent-to-Treat Analysis in Longitudinal Studies with Drop-Outs. Biometrics, 52, Taylor conference 27 33

34 References Ngo, L. and Wand, M. P. (24). Smoothing with Mixed Model Software. Journal of Statistical Software 9, -54. Rosenbaum, P. R. and Rubin, D. B. (983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 7, Rubin, D.B. (974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66, 5, Rubin, D. B. (976). Inference and Missing Data. Biometrika 63, Rubin, D.B. (987). Multiple Imputation for Nonresponse in Surveys: New York: Wiley Zhang, G. & Little, R. J. (29). Extensions of the Penalized Spline of Propensity Prediction Method of Imputation. Biometrics, 65, 3, Zhang, G. & Little, R. J. (2). A Comparative Study of Doubly-Robust Estimators of the Mean with Missing Data. Journal of Statistical Computation and Simulation, 8, 2, Zhou, T. Elliott, M.R. and Little, R.J. (27). Penalized Spline of Propensity Methods for Treatment Comparisons. Under revision for publication. Taylor conference 27 34

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean