A Distributioal Approach Usig Propesity Scores Zhiqiag Ta Departmet of Biostatistics Johs Hopkis School of Public Health http://www.biostat.jhsph.edu/ zta Jue 20, 2005
Outlie Itroductio Couterfactual framework Illustratio Applicatio No-cofoudig case Kow propesity score Parametric propesity score Cofoudig case
Itroductio Right heart catheterizatio (RHC) is performed daily i hospitals sice 1970s. The beefit of RHC had NOT bee demostrated i a successful radomized cliical trial. Coors et al. s (1996) observatioal study raised the cocer that RHC might ot beefit critically ill patiets ad might i fact cause harm. Data were collected o 5735 critically ill patiets admitted to the ICUs of five medical ceters: Treatmet: No-RHC or RHC Outcome: 30-day survival Covariates: 75 covariates HOW to evaluate the effect of RHC o survival?
Couterfactual framework X: covariates measured T : treatmet variable takig value 0 or 1 if a patiet actually receives No-RHC or RHC (Y 0, Y 1 ): potetial outcome that would be observed if a patiet received No-RHC or RHC Y = (1 T ) Y 0 + T Y 1 : observed outcome We are iterested i average causal effect E( Y 1 Y 0 ) = E(Y 1 ) E(Y 0 ) or P ({Y 1 }) versus P ({Y 0 }) Assigmet mechaism No-cofoudig: Cofoudig: T (Y 0, Y 1 ) X T (Y 0, Y 1 ) X Propesity score: π(x) = P (T = 1 X)
Thirty day survival curves RHC, Raw No RHC, Raw 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Raw histogram of aps RHC No RHC 0 50 100 150 Raw histogram of meabp 0 50 100 150 200 250 Raw histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006
Illustratio RHC= 1 RHC= 0 BP= 1 (52, 28) 80 (11, 9) 20 BP= 0 (30, 10) 40 (37, 23) 60 82, 38 120 48, 32 80 Patiets get RHC at radom P ( survival RHC = 1 ) 82/120 = 68.3% P ( survival RHC = 0 ) 48/80 = 60.0% Patiets get RHC at radom give blood pressure 80 40 60 20 100 100 Weight each patiet such that 80w 1 (1) = 1/2, 40w 1 (0) = 1/2, 20w 0 (1) = 1/2, 60w 0 (0) = 1/2. Compare the weighted probabilities 52w 1 (1) + 30w 1 (0) = 70.0%, 11w 0 (1) + 37w 0 (0) = 58.3%.
WHAT IF patiets are NOT equally likely to get RHC at each level of blood pressure? Previous estimates: P ( obs survival BP =, RHC = 1 ) = 70.0%, P ( obs survival BP =, RHC = 0 ) = 58.3%. Weight each patiet such that 80 140 21 λ 1i w 1 (1) = 1 120 2, i=81 λ 0i w 0 (1) = 1 200 2, 41 where Λ 1 λ 1i, λ 0i Λ (Λ = 1.5). λ 1i w 1 (0) = 1 2, λ 0i w 0 (0) = 1 2, Boud the weighted probabilities 120 λ 1i w 1 (X i ) Y 1i, 200 21 subject to the foregoig costraits. λ 0i w 0 (X i ) Y 0i, P (!obs survival BP =, RHC = 1 ) 72.2%, P (!obs survival BP =, RHC = 0 ) 55.0%.
Thirty day survival curves RHC, Raw No RHC, Raw 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Raw histogram of aps RHC No RHC 0 50 100 150 Raw histogram of meabp 0 50 100 150 200 250 Raw histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006
Thirty day survival curves RHC, Raw No RHC, Raw RHC, Weighted No RHC, Weighted 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Raw histogram of aps RHC No RHC 0 50 100 150 Raw histogram of meabp 0 50 100 150 200 250 Raw histogram of pafi 0 200 400 600 800 1000 Weighted histogram of aps RHC No RHC 0 50 100 150 Weighted histogram of meabp 0 50 100 150 200 250 Weighted histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006 0.000 0.002 0.004 0.006
Thirty day survival curves RHC, Observed RHC, Couterfactual No RHC, Couterfactual No RHC, Observed 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Weighted histogram of aps No RHC, Couterfactual No RHC, Observed 0 50 100 150 Weighted histogram of meabp 0 50 100 150 200 250 Weighted histogram of pafi 0 200 400 600 800 1000 Weighted histogram of aps RHC, Observed RHC, Couterfactual 0 50 100 150 Weighted histogram of meabp 0 50 100 150 200 250 Weighted histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006 0.000 0.002 0.004 0.006
No-cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: = L 1 L 2 [ ] (1 π(x i )) 1 T i π(x i ) T i [ ] G 0 ({X i, Y 0i }) 1 T i G 1 ({X i, Y 1i }) T i where G 0 is the joit distributio of (X, Y 0 ) ad G 1 is the joit distributio of (X, Y 1 ). G 0 ad G 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dg 0 (x, y 0 ) = h(x) dg 1 (x, y 1 ) for each bouded fuctio h o X. Take fiitely may costraits ad fid MLE (Ĝ 0, Ĝ 1 ): ˆµ 1 = y 1 dĝ 1 (x, y 1 ), ˆµ 0 = y 0 dĝ 0 (x, y 0 ).
Kow propesity score [Model S0: kow π ] Maximize the likelihood subject to the costraits π (x) dg 0 = π (x)dg 1, h j (x) dg 0 = h j (x)dg 1, j = 1,..., m. Let h = (π, 1 π, h 1,..., h m). Maximize 1 1 log(λ h (X i )) + 1 The Ĝ 1 {(X i, Y 1i )} = Ĝ 0 {(X i, Y 0i )} = i= 1 +1 log(1 λ h (X i )). 1 λ h (X i ), i = 1,..., 1, 1 1 λ h (X i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i π (X i ) β 1 [ 1 Y 0i (1 T i ) 1 π (X i ) β 0 h (X i ) ( Ti )] 1 π (X i ) π (X i ) 1, [ 1 h (X i ) ( 1 Ti )] π (X i ) 1 π (X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.
The method of cotrol variates: 1 Y 1i T i π (X i ) b 1 [ 1 i=0 h (X i ) 1 π (X i ) ( Ti π (X i ) 1 )]. The optimal choice of b 1 is β 1 = B 1 C 1. A more geeral class of estimators: 1 Y 1i T i π (X i ) 1 ( Ti ) φ 1 (X i ) π (X i ) 1. The optimal choice of φ 1 (x) is E(Y 1 X = x). achieves semiparametric efficiecy uder S0. Choose h such that E(Y 1 X = x) is cotaied the liear spa of E(Y 0 X = x) is cotaied the liear spa of h (x) 1 π (x), h (x) π (x). Outcome regressio [Model R] E(Y 1 X) = Ψ ( α 1 g 1(X) ), E(Y 0 X) = Ψ ( α 0 g 0(X) ). Choose h = ( π, 1 π, π Ψ(ˆα 0 g 0), (1 π )Ψ(ˆα 1 g 1) ).
Parametric propesity score [Model S: π( ; γ)] Maximize the likelihood subject to the costraits ˆπ(x) dg 1 = ˆπ(x)dG 0, ĥ j (x) dg 1 = ĥ j (x)dg 0, j = 1,..., m. Let ĥ = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m ). Maximize 1 1 log(λ ĥ(x i )) + 1 i= 1 +1 1 log(1 λ ĥ(x i )). The Ĝ 1 {(X i, Y 1i )} = λ ĥ(x i ), i = 1,..., 1, 1 Ĝ 0 {(X i, Y 0i )} = 1 λ ĥ(x i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i ˆπ(X i ) β 1 [ 1 Y 0i (1 T i ) 1 ˆπ(X i ) β 0 [ 1 ĥ(x i ) ( Ti )] 1 ˆπ(X i ) ˆπ(X i ) 1, ĥ(x i ) ( 1 Ti )] ˆπ(X i ) 1 ˆπ(X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.
Our strategy is To build ad check propesity score models to esure cosistecy To use outcome regressio models for variace ad bias reductio Propesity score models ca be checked with the followig idea: Pick up a collectio of test fuctios ĥ j s o X, for example, (ˆπ, 1 ˆπ, ˆπX, (1 ˆπ)X). Compute the sample average ( T Ẽ[ĥj (X) ˆπ(X) 1 T )] 1 ˆπ(X) i.e. average differece i ĥ j (X) betwee the treated ad cotrol after propesity score weightig. If model S is correct, the the sample averages relative to stadard errors, or z-ratios, should be statistically osigificat from zero. Examiatio of z-ratios agaist the stadard ormal ca reveal possible misspecificatio of model S.
z ratio 4 2 0 2 4 z ratio 4 2 0 2 4 1 2 3 4 Model 1 2 3 4 Model
Cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: L 1 L 2 [ ] = (1 π(x i )) 1 T i π(x i ) T i [ ] H 0 ({X i, Y 0i }) 1 T i H 1 ({X i, Y 1i }) T i where H 0 is the distributio P ({Y 0 } T = 0, X)P ({X}) ad H 1 is the distributio P ({Y 1 } T = 1, X)P ({X}). H 0 ad H 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dh 0 (x, y 0 ) = h(x) dh 1 (x, y 1 ) for each bouded fuctio h o X. Covergece of previous estimates: (Ĝ 0, Ĝ 1 ) (H 0, H 1 ) ˆµ 1, µ 1 E[E(Y 1 T = 1, X)] ˆµ 0, µ 0 E[E(Y 0 T = 0, X)]
Umeasured cofoudig: gaps betwee P ({Y 0 } T = 0, X) ad P ({Y 0 } T = 1, X) P ({Y 1 } T = 0, X) ad P ({Y 1 } T = 1, X) i.e. systematic differeces betwee the treated ad utreated eve if they received the same treatmet. Defie the Rado-Nikodym derivatives: λ 0 (Y 0 ; X) = P (dy 0 T = 1, X) P (dy 0 T = 0, X), λ 1 (Y 1 ; X) = P (dy 1 T = 0, X) P (dy 1 T = 1, X). The case λ 0 = λ 1 1 correspods to o cofoudig, while deviatios of λ 0 ad λ 1 from 1 idicate umeasured cofoudig. By Bayes rule, λ 0 ad λ 1 ca be see as odds ratios: λ 0 (Y 0 ; X) = 1 π(x) P (T = 1 Y 0, X) π(x) P (T = 0 Y 0, X), λ 1 (Y 1 ; X) = π(x) P (T = 0 Y 1, X) 1 π(x) P (T = 1 Y 1, X). A sesitivity aalysis model: Λ 1 λ 0 (Y 0 ; X), λ 1 (Y 1 ; X) Λ, where Λ 1 idicates the degree of departure from o cofoudig.
Let ĥ c = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m c). For a value of Λ, fid bouds for y t λ t dh t by liear programmig: mi or max y t λ t dĝ t subject to λ t dĝ t = 1, ˆπ(x)λ t dĝ t = ˆπ(x) dĝ t, ĥ j (x)λ t dĝ t = ĥ j (x) dĝ t, j = 1,..., m c, ad 1 Λ λ t Λ. Ĝ 1 is supported o {(X i, Y 1i )},...,1 ad Ĝ 0 o {(X i, Y 0,i )} i=1 +1,...,. Itegral is fiite sum. The ukows are the values of λ t o observed data: λ 1i = λ 1 (Y 1i ; X i ), i = 1,..., 1, λ 0i = λ 0 (Y 0i ; X i ), i = 1 + 1,...,. Comparisos of the distributios Ĝ 0 [Y 0 T = 0, X][X], Ĝ 1 [Y 1 T = 1, X][X] λ 0 dĝ 0 [Y 0 T = 1, X][X], λ 1 dĝ 1 [Y 1 T = 0, X][X] idicate (i) balace o covariates, (ii) hidde bias, ad (iii) causal effects.