OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores
Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity scores 15.3 Propensity stratification and standardization 15.4 Propensity matching 15.5 Propensity models, structural models, predictive models BIOS 776 2 15 Outcome regressions and propensity scores
15.1 Outcome regression Recall in 14 we specified the structural model E[Y a Y a=0 A = a,l] = β 1 a + β 2 al Note this model and g-estimation approach to inference do not require modeling the L Y association Thus g-est is protected from bias arising from mis-specifying the L Y association Suppose now that we are willing to model the L Y association within levels of A BIOS 776 3 15 Outcome regressions and propensity scores
15.1 Outcome regression Consider marginal structural model E[Y a L] = β 0 + β 1 a + β 2 al + β 3 L The effect of quitting smoking on weight gain in each stratum of L is a function of β 1 and β 2 Parameter β 3 is often (eg in a linear models course) referred to as main effect of L The terminology effect is misleading because β 3 may not have an interpretation as the causal effect of L. Eg, we have not indexed the potential outcomes on the left side by the levels of L, there may be confounding, etc β 3 simply quantifies how the mean of the counterfactual Y a=0 varies as a function of L BIOS 776 4 15 Outcome regressions and propensity scores
15.1 Outcome regression Because our goal is inference about the causal effect of A on Y, which is a function of β 1 and β 2, the parameters β 0 and β 3 are nuisance parameters An advantage of g-estimation is that we need not estimate the nuisance parameter β 3 (Fine Pt 15.1) Parameters of the structural model above can be consistently estimated by the outcome regression model E[Y A,C = 0,L] = α 0 + α 1 A + α 2 AL + α 3 L assuming L sufficient to adjust for confounding (and selection bias due to drop out C) Obtain ˆα 1 = ˆβ 1 2.6 and ˆα 2 = ˆβ 2 0.05 BIOS 776 5 15 Outcome regressions and propensity scores
15.1 Outcome regression These estimates can be interpreted as conditional causal effects Eg, the effect estimate for those smoking 5 cigs/day is Ê[Y A = 1,C = 0,L] Ê[Y A = 0,C = 0,L] = ˆβ 1 +5 ˆβ 2 = 2.6+0.25 2.8 Eg, effect estimate for those smoking 40 cigs/day is ˆβ 1 + 40 ˆβ 2 Outcome regression does not readily yield marginal causal effect estimate unless we fit E[Y A,C = 0,L] = α 0 + α 1 A + α 3 L which is likely mis-specified; for the NHESF data, ˆα 1 = 3.5 (95% CI 2.6, 4.3) BIOS 776 6 15 Outcome regressions and propensity scores
15.2 Propensity Scores Let p(l) = Pr[A = 1 L]; note p(l) close to 0 for individuals w/ low prob of receiving treatment and close to 1 for those w/ high prob of receiving treatment Ie, p(l) measures the propensity of individuals to receive treatment given information available in the covariates; propensity score In randomized trial where assignment to treatment or not equally likely, p(l) = 0.5 In observational studies treatment assignment/selection mechanism unknown; therefore p(l) unknown BIOS 776 7 15 Outcome regressions and propensity scores
15.2 Propensity Scores In IP weighting and g-estimation we estimated propensity scores p(l) by logistic regression code: Program 15.2 Here we only consider propensity scores for dichotomous treatments. Propensity score methods, other logitpr[a = 1 L] = α 0 + α 1 L When using IP weighting (Chapter 12) estimated the probability of treatment for each individual. Let us refer to this value of ( ) is close to 0 for individuals w treatment and is close to 1 for those who treatment. That is, ( ) measures the treatment given the information available ( ) is referred to as the propensity score In an ideal randomized trial in which to treatment =1, the propensity scor Under this model, individual 22941 had the lowest estimated propensity score (0.053), andindividual other related doubly-robust 24949 the highest es- the (0.793) than IP weighting and g-estimation data. timators, are difficult to generalize note that ( ) =0 5 for any choice of. some individuals may be more likely to cause treatment assignment is beyond the propensity score ( ) is unknown, and t Figure 15.1 shows the to distribution non-dichotomous oftreatments. the estimated propensity score in quitters A = 1 (top) and nonquitters A = 0 (bottom) Figure 15.1 In our example, we can estimate th logistic model for the probability of qu covariates. This is the same model th estimation. Under this model, individu lowest estimated propensity score (0 053 (0 793). Figure 15.1 shows the distributi in quitters =1(top) and nonquitters who quit smoking had, on average, a gre (0 312) than those who did not quit (0 2 thesameforthetreated =1and the u no confounding due to, i.e., there wou causal diagram. Individuals with same propensity sco values of some covariates. For examp may differ with respect to smoking inten be equally likely to quit smoking given individuals have the same conditional pr group =1. If we consider all individu BIOS 776 8 15 Outcome regressions and propensity scores superpopulation, this group will include
15.2 Propensity Scores As expected, those who quit smoking had, on average, a greater estimated probability of quitting (0.312) than those who did not quit (0.245) Individuals with same propensity score p(l) will not necessarily have same covariates L Eg, two individuals may have p(l) = 0.2 but different levels of smoking intensity and exercise, yet they may be equally likely to quit smoking given all variables in L If we consider all individuals in the super-population with the same value of p(l), this group may have different values of L, but distribution of L will be the same in the treated and untreated (HW) A L p(l) Thus propensity score is an example of a balancing score (Tech Pt 15.1) BIOS 776 9 15 Outcome regressions and propensity scores
If is sufficient to adjust for con- 15.2 founding Propensity and selection bias, Scores then ( ) is sufficient too. This result ity and positivity (besides, of course, wel propensity score methods is justifed by the ity of the treated and the untreated with exchangeability within levels of the propen Key result about propensity was derived by scores Rosenbaum (Rosenbaum and Rubin and Rubin 1983): If Y a A L, then Y a in a seminal paper published in A p(l) 1983. exchangeability ` implies ` els of the propensity score ( ) which me sity score equal to either 1 or 0 holds if of the covariates, asdefined in Chapt ` ( ) and positivity within levels o be used to estimate causal effects using s Ie, if L sufficient to adjust for confounding and selection bias, then each of these methods. p(l) is sufficient too gression), standardization, and matching. Figure 15.2 depicts the propensity score for the setting represented in Figure 7.1; p(l) 15.3 is anpropensity intermediate stratification between Land standardization A with a deterministic arrow from L to p(l) L p(l) Figure 15.2 A Y Under exchangeability and positivity, pr used to consistently estimate the average c a particular value of the propensity scor E[ =0 =0 ( ) = ]. In its simplest form individuals with the value. However, the variable that can take any value between 0 two individuals will have exactly the same v 1089 hadanestimated ( ) of 0 6563, whic causal effect among individuals with ( ) and the untreated with that particular va In practice, propensity score stratificat contain individuals with similar, but not i of the estimated ( ) is a popular choice classified in 10 strata of approximately e BIOS 776 10 15 Outcome regressions and propensity scores
15.2 Propensity Scores Proof of key result that if Y a A L, then Y a A p(l): Pr[A = 1 Y a = y, p(l) = r] = l Pr[A = 1,L = l Y a = y, p(l) = r] = l Pr[A = 1 L = l,y a = y, p(l) = r]pr[l = l Y a = y, p(l) = r] = l Pr[A = 1 L = l]pr[l = l Y a = y, p(l) = r] Similarly = l p(l)pr[l = l Y a = y, p(l) = r] = r Pr[A = 1 p(l) = r] = l p(l)pr[l = l p(l) = r] = r BIOS 776 11 15 Outcome regressions and propensity scores
15.3 Propensity stratification and standardization Assume Y a A L, which implies Y a A p(l). Therefore can identify causal effects within strata defined by p(l) E[Y a=1 Y a=0 p(l) = s] = E[Y A = 1, p(l) = s] E[Y A = 0, p(l) = s] If L contains at least one continuous covariate or more than a few categorical covariates, then p(l) may take on many values between 0 and 1 For NHESF example, only individual 1089 had an estimated p(l) of 0.6563; therefore, cannot estimate the causal effect among individuals with p(l) = 0.6563 by comparing the treated and the untreated with that particular value BIOS 776 12 15 Outcome regressions and propensity scores
15.3 Propensity stratification and standardization In practice, PS stratification is carried out within strata that contain individuals with similar, but not identical, values of p(l); deciles of ˆp(L) a popular choice For NHESF data, approx 160 individuals per decile, with wide 95% CIs Fewer strata (eg quintiles) may increase precision, but also may be more likely there is not exchangeability within strata; i.e., Y a A p(l) does not imply Y a A c 1 < p(l) < c 2 Lunceford and Davidian (2004) show PS stratification-based estimator performs poorly in finite samples compared to IPW estimators BIOS 776 13 15 Outcome regressions and propensity scores
15.3 Propensity stratification and standardization Alternatively consider the outcome regression model E[Y A,C = 0, p(l)] = α 0 + α 1 A + α 2 p(l) In practice p(l) unknown and first estimated by, say, logistic regression For NHESF data, estimated effect of quitting smoking on weight gain 3.6 (95% CI: 2.7, 4.5) kg. Validity of inference from outcome regression model above depends on correct specification of the relationship between p(l) and the mean outcome Y (e.g., in the model above we assume it is linear and there is no interaction between A and p(l)); IP weighting and g-estimation agnostic about this relationship If interaction term between A and p(l) included, then estimating the unconditional causal effect E[Y a=1 ] E[Y a=0 ] would require standardization as in 13, except here we would standardize over the distn of p(l) instead of the distn of L BIOS 776 14 15 Outcome regressions and propensity scores
15.4 Propensity Matching There are many forms of propensity matching General idea is to form a matched population in which the treated and the untreated are exchangeable because they have the same distribution of p(l) For example, one can match the untreated to the treated: each treated individual is paired with one (or more) untreated individuals with the same propensity score value. Subset of the original population comprised of treated-untreated pairs (or sets) is matched population Under exchangeability and positivity given p(l), association estimators in general will be consistent for causal effects in the matched population, e.g., observed risk ratio will be consistent for causal risk ratio in matched population BIOS 776 15 15 Outcome regressions and propensity scores
15.4 Propensity Matching Again, it is often the case that no two individuals in a data set have the same (estimated) propensity score Therefore individuals are matched if propensity scores are close according to some definition of closeness For example treated individual 1089 has an estimated PS of 0.6563; they might be matched to individual 1088 who has estimated PS of 0.6579 Individuals for whom no other individual is close in terms of PS may be excluded; thus the matched population and the target superpopulation may be different There are numerous ways of defining closeness, and detailed descriptions of these definitions are not in the text BIOS 776 16 15 Outcome regressions and propensity scores
15.4 Propensity Matching Defining closeness in propensity matching entails bias-variance tradeoff If closeness criteria too loose, individuals with relatively different values of p(l) will be matched to each other, the distribution of p(l) will differ between the treated and the untreated in the matched population, and exchangeability will not hold Conversely, if closeness criteria are too tight and many individuals are excluded by the matching procedure, there will be approximate exchangeability but effect estimate will be less precise BIOS 776 17 15 Outcome regressions and propensity scores
15.4 Propensity Matching In theory, propensity matching can be used to estimate the causal effect in a well characterized target population. Eg, when matching each treated individual with one or more untreated individuals and excluding the unmatched untreated, one is estimating the effect in the treated (cf Fine Pt 15.2) In practice, however, propensity matching may yield an effect estimate in a hard-to-describe subset of the study population Eg, under a given definition of closeness, some treated individuals cannot be matched with any untreated individuals and thus are excluded from analysis; effect estimate then corresponds to subset of population w/ values of estimated PS that have successful matches BIOS 776 18 15 Outcome regressions and propensity scores
15.4 Propensity Matching That PS matching forces investigators to restrict analysis to treatment groups with overlapping distributions of the estimated PS is strength of method However, interpretation can be difficult Eg, suppose based on Fig 15.1, we conclude can only estimate effect of smoking cessation for individuals with an estimated PS < 0.67. Who are these people? Restriction based on real world variables easier to interpret Eg, 2 individuals with estimated PS > 0.67 were only ones in the study who were over age 50 and had smoked for less than 10 years; could exclude them and explain that our effect estimate only applies to smokers under age 50 and to smokers 50 and over who had smoked for at least 10 years BIOS 776 19 15 Outcome regressions and propensity scores
15.5 Propensity models, structural models, predictive models Recap: In Part II of HR we consider propensity models and structural models Propensity models are models for the probability of treatment A given the variables L used to try to achieve conditional exchangeability Used for matching and stratification in this section; for IP weighting in 12; and for g-estimation in 14 Parameters of propensity model are nuisance parameters BIOS 776 20 15 Outcome regressions and propensity scores
15.5 Propensity models, structural models, predictive models Structural models describe the relation between the treatment A and some component of the distribution (e.g., the mean) of the counterfactual outcome Y a, either marginally or within levels of the variables L Parameters (coefficients) for treatment are not nuisance parameters; rather, they have a direct causal interpretation of effect of treatment on outcome MSM need not include effect modifiers unless of substantive interest SNM require inclusion of effect modifiers, if they exist, for valid inference BIOS 776 21 15 Outcome regressions and propensity scores
15.5 Propensity models, structural models, predictive models Outcome regression can be as a method for (i) causal inference or (ii) prediction As an example of (ii), a doctor may use a predictive model to identify individuals at high risk of disease; parameters of these predictive models do not necessarily have a causal interpretation Dual use of outcome regression for causal inference and prediction potentially confusing Confounding Eg, consider variable selection and the M-bias example U 1 U 2 L A Figure 7.4 Y The bias induced in Figure 7.4 between the structural and traditional d In Figure 7.3 there is also confoundi outcome share the common cause blocked by conditioning on. Therefore given, and we say that is a confo definition, is also a confounder and associated with the treatment (it share associated with the outcome conditiona effect on ), and it does not lie on the ca outcome. Again, there is no discrepancy definitions of confounder for the causal d The key figure is Figure 7.4. In this causes of treatment and outcome, a BIOS 776 22 15 Outcome regressions and propensity scores
15.5 Propensity models, structural models, predictive models Here L is predictive of outcome but introduces bias when drawing causal inference Note also that propensity models need not predict treatment A well; just need to include covariates L that guarantee exchangeability Including covariates predictive of treatment A but not necessary for exchangeability may increase variance Eg, consider two site study where p(l) = 0.01 at one site and p(l) = 0.99 at other site. Suppose site has no effect on outcome, so no need to condition/adjust for site. However, suppose we adjust for site by standardization. Then variance will be v large b/c within each site one of the two treatment groups will be v small. BIOS 776 23 15 Outcome regressions and propensity scores