A Distributional Approach Using Propensity Scores

Size: px

Start display at page:

Download "A Distributional Approach Using Propensity Scores"

Roxanne Malone
6 years ago
Views:

1 A Distributioal Approach Usig Propesity Scores Zhiqiag Ta Departmet of Biostatistics Johs Hopkis School of Public Health zta Jue 20, 2005

2 Outlie Itroductio Couterfactual framework Illustratio Applicatio No-cofoudig case Kow propesity score Parametric propesity score Cofoudig case

3 Itroductio Right heart catheterizatio (RHC) is performed daily i hospitals sice 1970s. The beefit of RHC had NOT bee demostrated i a successful radomized cliical trial. Coors et al. s (1996) observatioal study raised the cocer that RHC might ot beefit critically ill patiets ad might i fact cause harm. Data were collected o 5735 critically ill patiets admitted to the ICUs of five medical ceters: Treatmet: No-RHC or RHC Outcome: 30-day survival Covariates: 75 covariates HOW to evaluate the effect of RHC o survival?

4 Couterfactual framework X: covariates measured T : treatmet variable takig value 0 or 1 if a patiet actually receives No-RHC or RHC (Y 0, Y 1 ): potetial outcome that would be observed if a patiet received No-RHC or RHC Y = (1 T ) Y 0 + T Y 1 : observed outcome We are iterested i average causal effect E( Y 1 Y 0 ) = E(Y 1 ) E(Y 0 ) or P ({Y 1 }) versus P ({Y 0 }) Assigmet mechaism No-cofoudig: Cofoudig: T (Y 0, Y 1 ) X T (Y 0, Y 1 ) X Propesity score: π(x) = P (T = 1 X)

5 Thirty day survival curves RHC, Raw No RHC, Raw Day Proportio of Survivig Raw histogram of aps RHC No RHC Raw histogram of meabp Raw histogram of pafi

6 Illustratio RHC= 1 RHC= 0 BP= 1 (52, 28) 80 (11, 9) 20 BP= 0 (30, 10) 40 (37, 23) 60 82, , Patiets get RHC at radom P ( survival RHC = 1 ) 82/120 = 68.3% P ( survival RHC = 0 ) 48/80 = 60.0% Patiets get RHC at radom give blood pressure Weight each patiet such that 80w 1 (1) = 1/2, 40w 1 (0) = 1/2, 20w 0 (1) = 1/2, 60w 0 (0) = 1/2. Compare the weighted probabilities 52w 1 (1) + 30w 1 (0) = 70.0%, 11w 0 (1) + 37w 0 (0) = 58.3%.

7 WHAT IF patiets are NOT equally likely to get RHC at each level of blood pressure? Previous estimates: P ( obs survival BP =, RHC = 1 ) = 70.0%, P ( obs survival BP =, RHC = 0 ) = 58.3%. Weight each patiet such that λ 1i w 1 (1) = , i=81 λ 0i w 0 (1) = , 41 where Λ 1 λ 1i, λ 0i Λ (Λ = 1.5). λ 1i w 1 (0) = 1 2, λ 0i w 0 (0) = 1 2, Boud the weighted probabilities 120 λ 1i w 1 (X i ) Y 1i, subject to the foregoig costraits. λ 0i w 0 (X i ) Y 0i, P (!obs survival BP =, RHC = 1 ) 72.2%, P (!obs survival BP =, RHC = 0 ) 55.0%.

8 Thirty day survival curves RHC, Raw No RHC, Raw Day Proportio of Survivig Raw histogram of aps RHC No RHC Raw histogram of meabp Raw histogram of pafi

9 Thirty day survival curves RHC, Raw No RHC, Raw RHC, Weighted No RHC, Weighted Day Proportio of Survivig Raw histogram of aps RHC No RHC Raw histogram of meabp Raw histogram of pafi Weighted histogram of aps RHC No RHC Weighted histogram of meabp Weighted histogram of pafi

10 Thirty day survival curves RHC, Observed RHC, Couterfactual No RHC, Couterfactual No RHC, Observed Day Proportio of Survivig Weighted histogram of aps No RHC, Couterfactual No RHC, Observed Weighted histogram of meabp Weighted histogram of pafi Weighted histogram of aps RHC, Observed RHC, Couterfactual Weighted histogram of meabp Weighted histogram of pafi

11 No-cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: = L 1 L 2 [ ] (1 π(x i )) 1 T i π(x i ) T i [ ] G 0 ({X i, Y 0i }) 1 T i G 1 ({X i, Y 1i }) T i where G 0 is the joit distributio of (X, Y 0 ) ad G 1 is the joit distributio of (X, Y 1 ). G 0 ad G 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dg 0 (x, y 0 ) = h(x) dg 1 (x, y 1 ) for each bouded fuctio h o X. Take fiitely may costraits ad fid MLE (Ĝ 0, Ĝ 1 ): ˆµ 1 = y 1 dĝ 1 (x, y 1 ), ˆµ 0 = y 0 dĝ 0 (x, y 0 ).

12 Kow propesity score [Model S0: kow π ] Maximize the likelihood subject to the costraits π (x) dg 0 = π (x)dg 1, h j (x) dg 0 = h j (x)dg 1, j = 1,..., m. Let h = (π, 1 π, h 1,..., h m). Maximize 1 1 log(λ h (X i )) + 1 The Ĝ 1 {(X i, Y 1i )} = Ĝ 0 {(X i, Y 0i )} = i= 1 +1 log(1 λ h (X i )). 1 λ h (X i ), i = 1,..., 1, 1 1 λ h (X i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i π (X i ) β 1 [ 1 Y 0i (1 T i ) 1 π (X i ) β 0 h (X i ) ( Ti )] 1 π (X i ) π (X i ) 1, [ 1 h (X i ) ( 1 Ti )] π (X i ) 1 π (X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.

13 The method of cotrol variates: 1 Y 1i T i π (X i ) b 1 [ 1 i=0 h (X i ) 1 π (X i ) ( Ti π (X i ) 1 )]. The optimal choice of b 1 is β 1 = B 1 C 1. A more geeral class of estimators: 1 Y 1i T i π (X i ) 1 ( Ti ) φ 1 (X i ) π (X i ) 1. The optimal choice of φ 1 (x) is E(Y 1 X = x). achieves semiparametric efficiecy uder S0. Choose h such that E(Y 1 X = x) is cotaied the liear spa of E(Y 0 X = x) is cotaied the liear spa of h (x) 1 π (x), h (x) π (x). Outcome regressio [Model R] E(Y 1 X) = Ψ ( α 1 g 1(X) ), E(Y 0 X) = Ψ ( α 0 g 0(X) ). Choose h = ( π, 1 π, π Ψ(ˆα 0 g 0), (1 π )Ψ(ˆα 1 g 1) ).

14 Parametric propesity score [Model S: π( ; γ)] Maximize the likelihood subject to the costraits ˆπ(x) dg 1 = ˆπ(x)dG 0, ĥ j (x) dg 1 = ĥ j (x)dg 0, j = 1,..., m. Let ĥ = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m ). Maximize 1 1 log(λ ĥ(x i )) + 1 i= log(1 λ ĥ(x i )). The Ĝ 1 {(X i, Y 1i )} = λ ĥ(x i ), i = 1,..., 1, 1 Ĝ 0 {(X i, Y 0i )} = 1 λ ĥ(x i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i ˆπ(X i ) β 1 [ 1 Y 0i (1 T i ) 1 ˆπ(X i ) β 0 [ 1 ĥ(x i ) ( Ti )] 1 ˆπ(X i ) ˆπ(X i ) 1, ĥ(x i ) ( 1 Ti )] ˆπ(X i ) 1 ˆπ(X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.

15 Our strategy is To build ad check propesity score models to esure cosistecy To use outcome regressio models for variace ad bias reductio Propesity score models ca be checked with the followig idea: Pick up a collectio of test fuctios ĥ j s o X, for example, (ˆπ, 1 ˆπ, ˆπX, (1 ˆπ)X). Compute the sample average ( T Ẽ[ĥj (X) ˆπ(X) 1 T )] 1 ˆπ(X) i.e. average differece i ĥ j (X) betwee the treated ad cotrol after propesity score weightig. If model S is correct, the the sample averages relative to stadard errors, or z-ratios, should be statistically osigificat from zero. Examiatio of z-ratios agaist the stadard ormal ca reveal possible misspecificatio of model S.

16 z ratio z ratio Model Model

17 Cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: L 1 L 2 [ ] = (1 π(x i )) 1 T i π(x i ) T i [ ] H 0 ({X i, Y 0i }) 1 T i H 1 ({X i, Y 1i }) T i where H 0 is the distributio P ({Y 0 } T = 0, X)P ({X}) ad H 1 is the distributio P ({Y 1 } T = 1, X)P ({X}). H 0 ad H 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dh 0 (x, y 0 ) = h(x) dh 1 (x, y 1 ) for each bouded fuctio h o X. Covergece of previous estimates: (Ĝ 0, Ĝ 1 ) (H 0, H 1 ) ˆµ 1, µ 1 E[E(Y 1 T = 1, X)] ˆµ 0, µ 0 E[E(Y 0 T = 0, X)]

18 Umeasured cofoudig: gaps betwee P ({Y 0 } T = 0, X) ad P ({Y 0 } T = 1, X) P ({Y 1 } T = 0, X) ad P ({Y 1 } T = 1, X) i.e. systematic differeces betwee the treated ad utreated eve if they received the same treatmet. Defie the Rado-Nikodym derivatives: λ 0 (Y 0 ; X) = P (dy 0 T = 1, X) P (dy 0 T = 0, X), λ 1 (Y 1 ; X) = P (dy 1 T = 0, X) P (dy 1 T = 1, X). The case λ 0 = λ 1 1 correspods to o cofoudig, while deviatios of λ 0 ad λ 1 from 1 idicate umeasured cofoudig. By Bayes rule, λ 0 ad λ 1 ca be see as odds ratios: λ 0 (Y 0 ; X) = 1 π(x) P (T = 1 Y 0, X) π(x) P (T = 0 Y 0, X), λ 1 (Y 1 ; X) = π(x) P (T = 0 Y 1, X) 1 π(x) P (T = 1 Y 1, X). A sesitivity aalysis model: Λ 1 λ 0 (Y 0 ; X), λ 1 (Y 1 ; X) Λ, where Λ 1 idicates the degree of departure from o cofoudig.

19 Let ĥ c = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m c). For a value of Λ, fid bouds for y t λ t dh t by liear programmig: mi or max y t λ t dĝ t subject to λ t dĝ t = 1, ˆπ(x)λ t dĝ t = ˆπ(x) dĝ t, ĥ j (x)λ t dĝ t = ĥ j (x) dĝ t, j = 1,..., m c, ad 1 Λ λ t Λ. Ĝ 1 is supported o {(X i, Y 1i )},...,1 ad Ĝ 0 o {(X i, Y 0,i )} i=1 +1,...,. Itegral is fiite sum. The ukows are the values of λ t o observed data: λ 1i = λ 1 (Y 1i ; X i ), i = 1,..., 1, λ 0i = λ 0 (Y 0i ; X i ), i = 1 + 1,...,. Comparisos of the distributios Ĝ 0 [Y 0 T = 0, X][X], Ĝ 1 [Y 1 T = 1, X][X] λ 0 dĝ 0 [Y 0 T = 1, X][X], λ 1 dĝ 1 [Y 1 T = 0, X][X] idicate (i) balace o covariates, (ii) hidde bias, ad (iii) causal effects.

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d