Bootstrapping Sensitivity Analysis

Size: px

Start display at page:

Download "Bootstrapping Sensitivity Analysis"

Jemima Hoover
5 years ago
Views:

1 Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya. Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. arxiv: R package bootsens:

2 Why sensitivity analysis? Unless we have perfectly executed randomized experiment, causal inference is based on some unverifiable assumptions. In observational studies, the most commonly used assumption is ignorability or no unmeasured confounding: A Y (0), Y (1) X. We can only say this assumption is plausible. Sensitivity analysis asks: what if this assumption does not hold? Does our qualitative conclusion still hold? This question appears in multiple settings: 1. Confounded observational studies. 2. Survey sampling with missing not at random (MNAR). 3. Longitudinal study with non-ignorable dropout. In general, this means that the target parameter (e.g. average treatment effect) is only partially identified. 1/20

3 Overview: Bootstrapping sensitivity analysis Point-identified parameter: Efron s bootstrap Bootstrap Point estimator ============ Confidence interval Partially identified parameter: An analogy Optimization Percentile Bootstrap Minimax inequality Extrema estimator ============ Confidence interval Rest of the talk Apply this idea to IPW estimators in a marginal sensitivity model. 2/20

4 Some existing sensitivity models Generally, we need to specify how unconfoundedness is violated. 1. Y models: Consider a specific difference between the conditional distribution Y (a) X, A and Y (a) X. Usually called pattern mixture models. Robins (1999, 2002); Birmingham et al. (2003); Vansteelandt et al. (2006); Daniels and Hogan (2008). 2. A models: Consider a specific difference between the conditional distribution A X, Y (a) and A X. Usually called selection models. Scharfstein et al. (1999); Birmingham et al. (2003); Vansteelandt et al. (2006); Scharfstein et al. (2014). 3. Simultaneous models: Consider a range of A models and/or Y models and report the worst case result. Cornfield et al. (1959); Rosenbaum (2002); Ding and VanderWeele (2016); Fogarty (2017). Our sensitivity model: A hybrid of 2nd and 3rd. 3/20

5 Rosenbaum s sensitivity model Imagine there is an unobserved confounder U that summarizes all confounding, so A Y (0), Y (1) X, U. Let e 0 (x, u) = P 0 (A = 1 X = x, U = u). Rosenbaum s sensitivity model { 1 } R(Γ) = e(x, u) : Γ OR(e(x, u 1), e(x, u 2 )) Γ, x X, u 1, u 2, where OR(p 1, p 2 ) := [p 1 /(1 p 1 )]/[p 2 /(1 p 2 )] is the odds ratio. Rosenbaum s question: can we reject the sharp null hypothesis Y (0) Y (1) for every e 0 (x, u) R(Γ)? Robins (2002): we don t need to assume the existence of U. Let U = Y (1) when the goal is to estimate E[Y (1)]. 4/20

6 Our sensitivity model Let e 0 (x) = P 0 (A = 1 X = x) be the propensity score. Marginal sensitivity models { M(Γ) = e(x, y) : Compare this to Rosenbaum s model: R(Γ) = { e(x, u) : 1 Γ OR(e(x, y), e 0(x)) Γ, x X, y }. 1 } Γ OR(e(x, u 1), e(x, u 2 )) Γ, x X, u 1, u 2. Tan (2006) (first?) considered this model, but he did not consider statistical inference in finite sample. Relationship between the two models: M( Γ) R(Γ) M(Γ). 1 For observational studies, we assume both P 0 (A = 1 X, Y (1)), P 0 (A = 1 X, Y (0)) M(Γ). 1 The second part needs compatibility : e(x, y) marginalizes to e0(x). 5/20

7 Parametric extension In practice, the propensity score e 0 (X ) = P 0 (A = 1 X ) is often estimated by a parametric model. Definition (Parametric marginal sensitivity models) { 1 } M β0 (Γ) = e(x, y) : Γ OR(e(x, y), e β 0 (x)) Γ, x X, y, where e β0 (x) is the best parametric approximation of e 0(x). This model covers both 1. Model misspecification, that is, e β0 (x) e 0 (x); and 2. Missing not at random, that is, e 0 (x) e 0 (x, y). 6/20

8 Logistic representations 1. Rosenbaum s sensitivity model: logit(e(x, u)) = g(x) + γu, where 0 U 1 and γ = log Γ. 2. Marginal sensitivity model: logit(e (h) (x, y)) = logit(e 0 (x)) + h(x, y), where h = sup h(x, y) γ. Due to this reason we also call it marginal L -sensitivity model. 3. Parametric marginal sensitivity model: logit(e (h) (x, y)) = logit(e β0 (x)) + h(x, y), where h = sup h(x, y) γ. 7/20

9 Confidence interval I For simplicity, consider the missing data problem where Y = Y (1) is only observed if A = 1. Observe i.i.d. samples (A i, X i, A i Y i ), i = 1,..., n. The estimand is µ 0 = E 0 [Y ], however it is only partially identified under a simultaneous sensitivity model. Goal 1 (Coverage of true parameter) Construct a data-dependent interval [L, U] such that P 0 ( µ0 [L, U] ) 1 α whenever e 0 (X, Y ) = P 0 (A = 1 X, Y ) M(Γ). 8/20

10 Confidence interval II The inverse probability weighting (IPW) identity: Define [ E 0 [Y ] = E AY e 0 (X, Y ) ] [ MAR AY ] = E. e 0 (X ) [ ] µ (h) AY = E 0 e (h) (X, Y ) Partially identified region: {µ (h) : e (h) M(Γ)}. Goal 2 (Coverage of partially identified region) Construct a data-dependent interval [L, U] such that ) P 0 ({µ (h) : e (h) M(Γ)} [L, U] 1 α. Imbens and Manski (2004) have discussed the difference between these two Goals. 9/20

11 An intuitive idea: The Union Method Suppose for any h, we have a confidence interval [L (h), U (h) ] such that lim inf n P 0(µ (h) [L (h), U (h) ]) 1 α Let L = inf h L(h) and U = sup U (h), so [L, U] is the union interval. h Theorem 1. [L, U] satisfies Goal 1 asymptotically. 2. Furthermore if the intervals are congruent : α < α such that lim sup n ( P 0 µ (h) < L (h)) α (, lim sup P 0 µ (h) > U (h)) α α. n Then [L, U] satisfies Goal 2 asymptotically. 10/20

12 Practical challenge: How to take the union? Suppose ĝ(x) is an estimate of logit(e 0 (x)). For a specific difference h, we can estimate e (h) (x, y) by ê (h) (x, y) = e h(x,y) ĝ(x,y). This leads to an (stabilized) IPW estimate of µ (h) : ˆµ (h) = [ 1 n n i=1 A i ] 1 [ 1 ê (h) (X i, Y i ) n n i=1 Using the general Z-estimation theory, n ( ˆµ (h) µ (h)) d N(0, (σ (h) ) 2 ) ] A i Y i. ê (h) (X i, Y i ) Therefore we can use [L (h), U (h) ] = ˆµ (h) z α 2 ˆσ(h) n. However, computing the union interval requires solving a complicated optimization problem. 11/20

13 Bootstrapping sensitivity analysis Point-identified parameter: Efron s bootstrap Bootstrap Point estimator ============ Confidence interval Partially identified parameter: An analogy Optimization Percentile Bootstrap Minimax inequality Extrema estimator ============ Confidence interval A simple procedure for simultaneous sensitivity analysis 1. Generate B random resamples of the data. For each resample, compute the extrema of IPW estimates under M β0 (Γ). 2. Construct the confidence interval using L = Q α/2 of the B minima and U = Q 1 α/2 of the B maxima. Theorem [L, U] achieves Goal 2 for M β0 (Γ) asymptotically. 12/20

14 Proof of the Theorem Partially identified parameter: Three ideas Optimization 1. Percentile Bootstrap 2. Minimax inequality Extrema estimator ============ Confidence interval 1. The sampling variability of ˆµ (h) can be captured by bootstrap. The percentile bootstrap CI is given by [ ) )] Q α (ˆµ (h) 2 b, Q 1 α (ˆµ (h) 2 b. 2. Generalized minimax inequality: Q α 2 ( inf h ) ˆµ (h) b inf h Q α 2 Percentile Bootstrap CI ) (ˆµ (h) b sup h Union CI Q 1 α 2 ) (ˆµ (h) b Q 1 α 2 ( sup h ) ˆµ (h) b. 13/20

15 Computation Partially identified parameter: Three ideas 3. Optimization Percentile Bootstrap Minimax inequality Extrema estimator ============ Confidence interval 3. Computing extrema of ˆµ (h) is a linear fractional programming: Let z i = e h(x i,y i ), we just need to solve n i=1 max or min A ( iy i 1 + zi e ) ĝ(x i ) n i=1 A ( i 1 + zi e ), ĝ(x i ) subject to z i [Γ 1, Γ], i = 1,..., n. This can be converted to a linear programming. Moreover, the solution z must have the same/opposite order as Y, so the time complexity can be reduced to O(n) (optimal). The role of Bootstrap Comapred to the union method, the workflow is greatly simplified: 1. No need to derive σ (h) analytically (though we could). 2. No need to optimize σ (h) (which is very challenging). 14/20

16 Comparison with Rosenbaum s sensitivity analysis Rosenbaum s paradigm New bootstrap approach Population Sample Super-population Design Matching Weighting Sensitivity model M( Γ) R(Γ) M(Γ) Inference Bounding p-value CI for ATE/ATT Effect modification Constant effect Allow for heterogeneity Extension Carefully developed for observational studies Can be applied to missing data problems 15/20

17 Example Fish consumption and blood mercury 873 controls: 1 serving of fish per month. 234 treated: 12 servings of fish per month. Covariates: gender, age, income (very imblanced), race, education, ever smoked, # cigarettes. Implementation details Rosenbaum s method: 1-1 matching, CI constructed by Hodges-Lehmann (assuming causal effect is constant). Our method (percentile Bootstrap): stabilized IPW for ATT w/wo augmentation by outcome linear regression. 16/20

18 Results Recall that M( Γ) R(Γ) M(Γ). 4 3 Causal effect 2 1 Matching SIPW (ATT) SAIPW (ATT) Γ Figure: The solid error bars are the range of point estimates and the dashed error bars (together with the solid bars) are the confidence intervals. The circles/triangles/squares are the mid-points of the solid bars. 17/20

19 References I J. Birmingham, A. Rotnitzky, and G. M. Fitzmaurice. Pattern-mixture and selection models for analysing longitudinal data with monotone missing patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1): , J. Cornfield, W. Haenszel, E. C. Hammond, A. M. Lilienfeld, M. B. Shimkin, and E. L. Wynder. Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute, 22(1): , M. J. Daniels and J. W. Hogan. Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis. CRC Press, P. Ding and T. J. VanderWeele. Sensitivity analysis without assumptions. Epidemiology, 27(3): 368, C. B. Fogarty. Studentized sensitivity analysis for the sample average treatment effect in paired observational studies. arxiv preprint arxiv: , G. W. Imbens and C. F. Manski. Confidence intervals for partially identified parameters. Econometrica, 72(6): , J. M. Robins. Association, causation, and marginal structural models. Synthese, 121(1): , J. M. Robins. Comment on covariance adjustment in randomized experiments and observational studies. Statistical Science, 17(3): , P. R. Rosenbaum. Observational Studies. Springer New York, D. Scharfstein, A. McDermott, W. Olson, and F. Wiegand. Global sensitivity analysis for repeated measures studies with informative dropout: A fully parametric approach. Statistics in Biopharmaceutical Research, 6(4): , D. O. Scharfstein, A. Rotnitzky, and J. M. Robins. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448): , /20

20 References II Z. Tan. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101(476): , S. Vansteelandt, E. Goetghebeur, M. G. Kenward, and G. Molenberghs. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica, 16(3): , /20

21 Possible open problems for discussion We didn t require e (h) (x, y) to be compatible (marginalizes to e 0 (x)). Can we use this property to reduce the length of CI? Not a problem for Rosenbaum s analysis because he conditions on one treated in every pair. Compatibility between A model and Y model? Can we consider different sensitivity models? Example: require h(x, y) to be Lipschitz continuous. Considered in the paper, still a LFP. Can we do average case sensitivity analysis, e.g. L1 instead of L? How can we extend this framework to other semiparametric estimation problems (e.g. Longitudinal data with non-ignorable dropout)? 20/20

arxiv: v2 [stat.me] 8 Oct 2018

arxiv: v2 [stat.me] 8 Oct 2018 SENSITIVITY ANALYSIS FOR INVERSE PROBABILITY WEIGHTING ESTIMATORS VIA THE PERCENTILE BOOTSTRAP QINGYUAN ZHAO, DYLAN S. SMALL AND BHASWAR B. BHATTACHARYA arxiv:1711.1186v [stat.me] 8 Oct 018 Department