Marginal, crude and conditional odds ratios

Size: px

Start display at page:

Download "Marginal, crude and conditional odds ratios"

Dustin Parker
5 years ago
Views:

1 Marginal, crude and conditional odds ratios Denitions and estimation Travis Loux Gradute student, UC Davis Department of Statistics March 31, 2010

2 Parameter Denitions When measuring the eect of a binary exposure(z ) on a binary response (Y ), there are a number of probabilities of interest with regard to the population: marginal: P(Y = 1) crude: P(Y = 1 Z = 1) conditional: P(Y = 1 Z = 1, X = x) These probabilities are not necessarily equal and have dierent interpretations which are each useful in dierent settings. We need to nd ways to estimate each value, being clear about which of these quantities our statistics estimate.

3 The Counterfactual Model Denitions We will use the following notation: Z - the binary exposure (or treatment) of interest X - covariates associated with exposure and response Y 1 - the potential binary reponse if the unit is exposed Y 0 - the potential binary reponse if the unit is not exposed Though each unit in the population may potentially have either Y 1 and Y 0, we are only able to observe one. We call the observed response Y and use the relation Y = Z Y 1 + (1 Z) Y 0

4 The Counterfactual Model A population A hypothetical population would then look like the following: Unit X 1 X 2 X 3 Z Y 1 Y where the observed Y for each unit is colored blue (This population will be used in a simulation example later...)

5 Marginal Odds Ratio The marginal odds ratio can be obtained by comparing the odds of response in the population if everyone is exposed Odds exp = P (Y 1 = 1) P (Y 1 = 0) to the odds of response if everyone in not exposed Looking at the odds ratio as Odds unexp = P (Y 0 = 1) P (Y 0 = 0) Oddsexp Odds unexp, we see the marginal odds ratio ψ marg = P (Y 1 = 1) P (Y 0 = 0) P (Y 1 = 0) P (Y 0 = 1)

6 Crude Odds Ratio The marginal odds ratio is often approximated by estimating the crude odds ratio, obtained by taking the cross product P (Y = 1 Z = 1) P (Y = 0 Z = 0) ψ crude = P (Y = 0 Z = 1) P (Y = 1 Z = 0) = P (Y 1 = 1 Z = 1) P (Y 0 = 0 Z = 0) P (Y 1 = 0 Z = 1) P (Y 0 = 1 Z = 0) However, when confounding is present P (Y 1 = 1 Z = 1) = P (Y 1 = 1) P (Y 0 = 1 Z = 0) = P (Y 0 = 1) suggesting that the crude and marginal odds ratio are dierent values

7 Conditional Odds Ratio The conditional odds ratio is dened as ψ cond (x) = P (Y = 1 Z = 1, X = x) P (Y = 0 Z = 0, X = x) P (Y = 0 Z = 1, X = x) P (Y = 1 Z = 0, X = x) With the assumption of strongly ignorable treatment assignment, i.e. (Y 1, Y 0 ) Z X, we can remove Z from the conditional probabilities: ψ cond (x) = P (Y 1 = 1 X = x) P (Y 0 = 0 X = x) P (Y 1 = 0 X = x) P (Y 0 = 1 X = x)

8 Conditional Odds Ratio Under the logistic model If the data follow a logistic model with { exp β 0 + } p β j=1 j x j + γz P (Y = 1 X = x, Z = z) = { 1 + exp β 0 + } p β j=1 j x j + γz logistic regression will lead to an unbiased and asymptotically ecient estimator of ψ cond (x) In fact, the conditional odds ratio is constant across X and ψ cond (x) = ψ cond = e γ In order for the estimate to be unbiased, X must contain all predictors of Y, not just the confounders (Gail, Wieand, Piantadosi [1984])

9 Contingency Tables Counterfactual values For potential response Y 1, we have the contingency table: So we see that where N is the population size Z = 1 Z = 0 Y 1 = 1 A (1) B (1) Y 1 = 0 C (1) D (1) P (Y 1 = 1) = A(1) + B (1) N P (Y 1 = 0) = C (1) + D (1) N

10 Contingency Tables Counterfactual values For potential response Y 0, we have the contingency table: So we see that Z = 1 Z = 0 Y 0 = 1 A (0) B (0) Y 0 = 0 C (0) D (0) P (Y 0 = 1) = A(0) + B (0) N P (Y 0 = 0) = C (0) + D (0) N

11 Contingency Tables Counterfactual values Then the marginal odds ratio is equal to ψ marg = P (Y 1 = 1) P (Y 0 = 0) P (Y 1 = 0) P (Y 0 = 1) ψ marg = (A (1) + B (1)) ( C (0) + D (0)) (C (1) + D (1)) ( A (0) + B (0)) but this involves the unobservable quantities B (1), D (1), A (0), and C (0).

12 Contingency Tables Observed values The crude odds ratio is computed by classifying the observed data into the contingency table then taking the cross products Z = 1 Z = 0 Y = 1 A (1) B (0) Y = 0 C (1) D (0) ψ crude = A(1) D (0) C (1) B (0) Compare this to the marginal odds ratio: (A (1) + B (1)) ( C (0) + D (0)) ψ marg = (C (1) + D (1)) ( A (0) + B (0))

13 Estimation Crude odds ratio Given a sample of n observations, the crude odds ratio is estimated by classifying the sample into the contingency table Z = 1 Z = 0 Y = 1 a b Y = 0 c d then taking the ratio of cross products ˆψ crude = a d c b This will lead to unbiased estimation of ψ crude, but is typically biased for ψ marg.

14 Estimation Conditional odds ratio If the logistic model is correct, then ˆψ cond = e ˆγ will be a consistent and asymptotically ecient estimate of ψ cond. Other (non-parametric) possibilities include subclassifying the observations on the covariate X then using a non-parametric estimator such as the Woolf estimate or the Mantel-Haenzsel estimate. We will focus on the Mantel-Hanzsel estimate.

15 Estimation Mantel-Haenszel: Denition The Mantel-Haenszel estimator is computed by stratifying/sublcassifying observations into subtables based on the covariate X : Z = 1 Z = 0 Y = 1 a k b k Y = 0 c k d k n k for stratum k, k {1,..., K}, then using the estimate ˆψ MH = K k=1 K k=1 a k d k n k b k c k n k = K w k ˆψk k=1 where w k = b k c k n k K b l c l l=1 n l and ˆψ k = a k d k b k c k

16 Estimation Mantel-Haenszel: Stratication on the covariates Assuming that the conditional odds ratio is constant across values of X, as in logistic regression, we have the following results Categorical covariates: If the strata correspond to levels of X, then from the formula ˆψ MH = K w k ˆψ k k=1 we see that ˆψ MH will estimate to the conditional odds ratio.

17 Estimation Mantel-Haenszel: Stratication on the covariates Continuous covariates: X will vary within each stratum, so ˆψ MH will be biased in estimating the conditional odds ratio: ˆψ MH ψ cond w k bias k k ˆψ MH will be consistent for ψ cond only if k w k bias k = 1 When matching on X, we assure that x is constant within each stratum, so bias k = 1. This means that k w k bias k = k w k = 1 and ˆψ MH is consistent for ψ cond

18 The Propensity Score Denition and consequences Proposed by Rosenbaum and Rubin (1983), the propensity score is dened as e(x) = P (Z = 1 X = x) It is known that X is distributed similarly in the subpopulations dened by Z when conditioning in the propensity score: X Z e(x) Also, if treatment assignment is strongly ignorable, i.e. (Y 1, Y 0 ) Z X, we also know that (Y 1, Y 0 ) Z e(x) Together, these mean that e(x ) is a univariate summary of X which will account for any confounding of X with Z, giving hope that an estimate for the marginal odds ratio is possible

19 Using the Propensity Score Stratication and Matching As is the case with stratication on covariates, stratication on the propensity score will lead to an estimator which is biased for the marginal odds ratio due to residual confounding within each stratum. In fact, simulations have shown that perfect 1-to-1 matching on the popensity score also leads to a quantity which is biased for the marginal odds ratio. While the magnitude of the bias varies, the bias tends to be in the direction of the conditional odds ratio.

20 Using the Propensity Score Weighting There are a number of estimators of P (Y 1 = 1) and P (Y 0 = 1) which weight observations by the inverse of the propensity score. We will discuss the simplest: ˆP (Y 1 = 1) = 1 n n Z i Y i e i i=1 and ˆP (Y0 = 1) = 1 n n i=1 (1 Z i ) Y i (1 e i ) Lunceford and Davidian (2004) show that these quanities are unbiased for P (Y 1 = 1) and P (Y 0 = 1), respectively. Since both estimators are unbiased for the respective parameters, substituting these values into the formula for ψ marg = P(Y 1=1)P(Y 0 =0) P(Y 1 =0)P(Y 0 =1) will lead to an asymptotically unbiased estimator.

21 Using the Propensity Score Weighting Numerical problem: ê(x) very close to 0 or 1 may lead to 1 n n i=1 Z i Y i ê i / (0, 1) or 1 n n i=1 (1 Z i ) Y i 1 ê i / (0, 1) In simulations, < ê(x) < showed little reason for concern. This problem will occur when the domain of X in the exposed and unexposed groups are not equal

22 Simulation Example Creating the population For the following simulation I created a population of 2 million units, starting with three mutually independent standard normal covariates X 1 N (0, 1) X 2 N (0, 1) X 3 N (0, 1) From these covariates, I simulated Z through the model logit [P (Z = 1 X)] = 1X X X 3 Finally, I simulated Y 1 and Y 0 for each unit using the model logit [P (Y 1 = 1 X)] = 2X 1 X 2 + X 3 + log(3) logit [P (Y 0 = 1 X)] = 2X 1 X 2 + X 3

23 Simulation Example Revisiting the population A few units in the simulated population: Unit X 1 X 2 X 3 Z Y 1 Y where the observed Y for each unit is colored blue

24 Simulation Example Parameters of interest From the population, I calculated the following parameters: ψ marg = P (Y 1 = 1) P (Y 0 = 0) P (Y 1 = 0) P (Y 0 = 1) = ψ crude = P (Y 1 = 1 Z = 1) P (Y 0 = 0 Z = 0) P (Y 1 = 0 Z = 1) P (Y 0 = 1 Z = 0) = ψ cond = P (Y 1 = 1 X = x) P (Y 0 = 0 X = x) P (Y 1 = 0 X = x) P (Y 0 = 1 X = x) = where the conditional odds ratio was calculated via logistic regression.

25 Simulation Example Mechanics 10,000 simulations Random sample of 2,100 observations from my population Removed counterfactual knowledge Only observed Y, not Y 1 or Y 0 Remember, Y = Z Y1 + (1 Z) Y0 Stratication was done by quantiles of the relevant variables

26 Simulation Example Estimation: Stratication on the covariates Crude Logistic Number of strata per variable Conditional: Crude: Marginal:

27 Simulation Example Estimation: Matching on the propensity score Crude Logistic Number of propensity score strata Conditional: Crude: Marginal:

28 Simulation Example Estimation: Weighting by the propensity score Crude PS weighted Logistic Conditional: Crude: Marginal:

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios