Test of Association between Two Ordinal Variables while Adjusting for Covariates

Size: px

Start display at page:

Download "Test of Association between Two Ordinal Variables while Adjusting for Covariates"

Jonas Cannon
5 years ago
Views:

1 Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009

2 Examples Amblyopia

3 Examples Anisometropic Amblyopia Anisometropia: unequal refractive error between the two eyes. Anisometropia was detected on 974 preschool children in a photoscreening program (Leon et al. 2008). Variables collected: anisometropia magnitude (X) (0, A, B, C) visual acuity, used to define amblyopia level (Y ) (0-3) age (Z ) (range 0-6) Goal: Test the association between anisometropia and amblyopia while adjusting for the effect of age.

4 Examples Anisometropic Amblyopia #subjects: Proportion of amblyopia levels A B C Anisometropia level (ABS SE)

5 Examples CIN Stage Cervical specimens were collected for 303 non-pregnant HIV-infected women in Lusaka, Zambia (Parham et al. 2006). Variables collected: condom use (X) (never, rarely, almost always, and always) stage of cervical intraepithelial neoplastic (CIN) lesions (Y ) (5 levels) other factors such age, education, number of sexual partners. (Z ) Goal: Test for association between condom use and CIN stage controlling for other factors which may be associated with stages of the two diseases.

6 Problems in regression Regression Connect an outcome variable with input variable(s). Outcome continuous binary count ordinal survival Type of regression linear regression logistic regression Poisson regression (log-linear model) ordinal logistic regression (proportional odds model) Cox regression (proportional hazard model) For all these regression analyses, the right-hand side is β 0 + β 1 X β k X k.

7 Problems in regression Regression Connect an outcome variable with input variable(s). Outcome continuous binary count ordinal survival Type of regression linear regression logistic regression Poisson regression (log-linear model) ordinal logistic regression (proportional odds model) Cox regression (proportional hazard model) For all these regression analyses, the right-hand side is β 0 + β 1 X β k X k. What do we do when X 1 is ordered categorical?

8 Problems in regression When Ordinal X 1 is Treated as Continuous......, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this is unreasonable. One could assign numbers to the categories so that the assigned values reflect a linear relationship with the outcome. Such a transformation is difficult to choose and may lead to data dredging.

9 Problems in regression When Ordinal X 1 is Treated as Continuous......, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this is unreasonable. One could assign numbers to the categories so that the assigned values reflect a linear relationship with the outcome. Such a transformation is difficult to choose and may lead to data dredging. Splines also have drawbacks: uncertainty in number and locations of knots dependence on how the categories are coded non-monotonic results when monotonicity is expected difficulty when there are only three categories

10 Problems in regression When Ordinal X 1 is Treated as Categorical......, the order information is ignored. May have low power due to high degrees of freedom. May have non-monotonic effect estimates.

11 Problems in regression When Ordinal X 1 is Treated as Categorical......, the order information is ignored. May have low power due to high degrees of freedom. May have non-monotonic effect estimates. Isotonic regression: Fit regression treating X 1 as categorical. If the coefficients are not monotonic, combine adjacent categories that are in reverse of the general trend and re-fit regression. The grouping is data driven and the results need adjustment for this source of model selection variability. The degrees of freedom may still be high.

12 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2

13 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Kendall s tau: τ = C D n(n 1)/2. Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2

14 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2 Kendall s tau: τ = C D n(n 1)/2. Goodman and Kruskal s gamma: γ = C D C + D. This is same as Sommer s d for two groups with no tie in predicted probabilities.

15 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 π 11 π 12 π 13 2 π 21 π 22 π 23 3 π 31 π 32 π 33 4 π 41 π 42 π 43 Goodman and Kruskal s gamma: γ = π C π D π C + π D. Concordance probability: π C = π j1 l 1 π j2 l 2. j 1 <j 2,l 1 <l 2 Discordance probability: π D = π j1 l 1 π j2 l 2. j 1 <j 2,l 1 >l 2

16 tau and gamma Kendall s Partial tau τ xy,z = (τ xy τ xz τ yz )/ (1 τxz)(1 2 τyz). 2 But τ xy,z 0 under the null of H 0 : Y X Z.

17 tau and gamma Kendall s Partial tau τ xy,z = (τ xy τ xz τ yz )/ (1 τxz)(1 2 τyz). 2 But τ xy,z 0 under the null of H 0 : Y X Z. In general, for (X, Y, Z) N(0, Σ), we can similarly define r xy,z = (r xy r xz r yz )/ (1 rxz)(1 2 ryz). 2 When X = Z + e X, Y = Z + e Y, we have 0 < r xz = r yz = r xy < 1 and r xy,z 0.

18 tau and gamma Davis s Partial gamma Z = 1. Z = k π 11,1 π 12,1 π 13,1 π 21,1 π 22,1 π 23,1 π 31,1 π 32,1 π 33,1 π 41,1 π 42,1 π 43,1. π 11,k π 12,k π 13,k π 21,k π 22,k π 23,k π 31,k π 32,k π 33,k π 41,k π 42,k π 43,k Let π C = π C,i and π D = π D,i. Stratum-specific γ: γ 1 = π C,1 π D,1 π C,1 + π D,1,, γ k = π C,k π D,k π C,k + π D,k

19 tau and gamma Davis s Partial gamma Z = 1. Z = k π 11,1 π 12,1 π 13,1 π 21,1 π 22,1 π 23,1 π 31,1 π 32,1 π 33,1 π 41,1 π 42,1 π 43,1. π 11,k π 12,k π 13,k π 21,k π 22,k π 23,k π 31,k π 32,k π 33,k π 41,k π 42,k π 43,k Stratum-specific γ: γ 1 = π C,1 π D,1 π C,1 + π D,1,, γ k = π C,k π D,k π C,k + π D,k Let π C = π C,i and π D = π D,i. Davis s partial gamma: γ = π C π D π C + π D k ( πc,i + π D,i = π C + π D i=1 ) γ i Requires stratification on continuous or multivariable Z.

20 Motivation Motivation of Our Approach In linear regression, Y = β 0 + β 1 X + γ 1 Z γ k Z k, we test for H 0 : β 1 = 0. Alternatively, we could: 1 fit a linear regression of Y on Z and obtain Y res, 2 fit a linear regression of X on Z and obtain X res, 3 fit Y res = α 0 + α 1 X res and test for H 0 : α 1 = 0. Then ˆβ 1 ˆα 1 and their significance levels are similar if n k.

21 Motivation Correlation of Residuals in Linear Regression y, E(y z) y E(y z) x, E(x z) x E(x z)

22 Motivation Our Approach 1 Fit a proportional odds model of Y on Z, 2 Fit a proportional odds model of X on Z, 3 Construct test statistics.

23 Motivation Our Approach 1 Fit a proportional odds model of Y on Z, 2 Fit a proportional odds model of X on Z, 3 Construct test statistics. Question: What is a residual for proportional odds models?

24 Proportional odds models Proportional Odds Models Suppose Y has levels 1 Y < < s Y. For j = 1,, s 1, logit[p(y j Z)] = α Y j + Z β Y. 1 For each level j, it is a logistic regression model with intercept α Y j and coefficient β Y. 2 For any level j, the log odds ratio between two subjects with covariates z 2 and z 1 is logit[p(y j z 2 )] logit[p(y j z 1 )] = (z 2 z 1 )βy. That is, odds 2 /odds 1 = e z 2 βy /e z 1 βy for ANY j.

25 Notation and model fitting Notation Y has levels 1 Y < < s Y ; X has levels 1 X < < t X. In general, their joint distribution is P = P(Y, X) = {π jl }. Under the null, P(Y, X Z) = P(Y Z)P(X Z), and P 0 = P 0 (Y, X) = P(Y, X Z)dZ = P(Y Z)P(X Z)dZ. z z

26 Notation and model fitting Notation Y has levels 1 Y < < s Y ; X has levels 1 X < < t X. In general, their joint distribution is P = P(Y, X) = {π jl }. Under the null, P(Y, X Z) = P(Y Z)P(X Z), and P 0 = P 0 (Y, X) = P(Y, X Z)dZ = P(Y Z)P(X Z)dZ. z Suppose (X i, Y i, Z i ) i.i.d. (X, Y, Z), i = 1,, n. For subject i, let p j i = P(Y i = j Z i = z i ), (j = 1,, s). qi l = P(X i = l Z i = z i ), (l = 1,, t). z

27 Notation and model fitting Estimation of P and P 0 Under the null, we model P(Y Z) and P(X Z) separately: and obtain estimates ˆp j i and ˆq l i. ˆP 0 = {ˆπ 0 jl }, where ˆπ0 jl = 1 n logit[p(y j Z)] = α Y j + Z β Y, (1) logit[p(x l Z)] = α X l + Z β X, (2) i ˆpj i ˆql i. ˆP = {ˆπ jl }, where ˆπ jl = n jl /n and n jl = #(Y = j, X = l).

28 Method 1: Observed versus expected Estimation of P and P 0 ˆq 1 i ˆq 2 i ˆq 3 i ˆp 1 i 1 ˆq i 1 1 ˆq i 2 1 ˆq i 3 ˆp 2 i 2 ˆq i 1 2 ˆq i 2 2 ˆq i 3 ˆp 3 i 3 ˆq i 1 3 ˆq i 2 3 ˆq i 3 ˆp 4 i 4 ˆq i 1 4 ˆq i 2 4 ˆq i 3 P ˆp 1 i ˆq 1 i P ˆp 1 i ˆq 2 i P ˆp 1 i ˆq 3 i n 11 n 12 n 13 nˆp 0 = P ˆp 2 i ˆq 1 i P ˆp 2 i ˆq 2 i P ˆp 2 i ˆq 3 i nˆp = n 21 n 22 n 23 P ˆp 3 i ˆq 1 i P ˆp 3 i ˆq 2 i P ˆp 3 i ˆq 3 i n 31 n 32 n 33 P ˆp 4 i ˆq 1 i P ˆp 4 i ˆq 2 i P ˆp 4 i ˆq 3 i n 41 n 42 n 43

29 Method 1: Observed versus expected Method 1: Observed versus Expected We compare the observed joint distribution P with the expected distribution P 0 under the null. 1 Summarize the two distributions separately by calculating Goodman and Kruskal s gamma, Γ 1 = Γ(ˆP) and Γ 0 = Γ(ˆP 0 ). 2 Test 1: T 1 = Γ 1 Γ 0. Note: A direct goodness-of-fit approach that involves calculating a statistic in the form of j,l (O E)2 /E, ignores the order information.

30 Method 2: Residual-based test statistics Method 2: Residual-based Test Statistics Residual in linear regression: Y = β 0 + γ 1 Z γ k Z k. For a subject with Y = y and Z = z, we first obtain the fitted value, ŷ = E(Y z), and then calculate the residual as y ŷ.

31 Method 2: Residual-based test statistics Method 2: Residual-based Test Statistics Residual in linear regression: Y = β 0 + γ 1 Z γ k Z k. For a subject with Y = y and Z = z, we first obtain the fitted value, ŷ = E(Y z), and then calculate the residual as y ŷ. In addition to the fitted value, a linear regression model also yields a distribution of possible outcome values, say Y fit Y z. The residual is y ŷ = y E(Y fit ) = E(y Y fit ). This is easy to understand from a Bayesian perspective.

32 Method 2: Residual-based test statistics Residual as A Distribution density density y z y y yfit

33 Method 2: Residual-based test statistics Residuals for Proportional Odds Models In model for P(Y Z), logit[p(y j Z)] = α Y j + Z β Y, Y i,fit Y z i {ˆp j i } is multinomial. We cannot calculate y i Y i,fit. We can evaluate if y i is at a higher or lower level than Y i,fit. 1 p i,high = P(y i > Y i,fit ) = j<y i ˆp j i (scored as +1) 2 p i,low = P(y i < Y i,fit ) = j>y i ˆp j i (scored as 1) 3 P(y i = Y i,fit ) = ˆp y i i (scored as 0) Residual is defined as the expected score, Y i,res = p i,high p i,low. For model (2), we have q i,high, q i,low, and X i,res = q i,high q i,low.

34 Method 2: Residual-based test statistics Residuals for Proportional Odds Models lower ( 1) tie (0) higher (+1) residual

35 Method 2: Residual-based test statistics Residual-based Statistics Test 2: T 2 = corr(y res, X res ).

36 Method 3: (Y i, X i ) versus (Y, X) Z i Method 3: (Y i, X i ) versus (Y, X) Z i For each subject, compare the observed value of (Y i, X i ) with all possible values of (Y, X) Z i {p j i ql i } under the null. Consider drawing (Y i, X i ) randomly from (Y, X) Z i, and scoring concordance, if Y i > Y i & X i > X i (or Y i < Y i & X i < X i ) discordance, if Y i > Y i & X i < X i (or Y i < Y i & X i > X i ) tie, otherwise

37 Method 3: (Y i, X i ) versus (Y, X) Z i Compare (Y i, X i ) with (Y, X) Z i ˆq 1 i ˆq 2 i ˆq 3 i q i,high ˆq 2 i q i,low ˆp 1 i 1 ˆq i 1 1 ˆq i 2 1 ˆq i 3 p i,high C tie D ˆp 2 i 2 ˆq i 1 2 ˆq i 2 2 ˆq i 3 ˆp 3 i tie tie tie ˆp 3 i 3 ˆq i 1 3 ˆq i 2 3 ˆq i 3 p i,low D tie C ˆp 4 i 4 ˆq i 1 4 ˆq i 2 4 ˆq i 3 Pr(concordance) is estimated as Ĉi = p i,high q i,high + p i,low q i,low. Pr(discordance) is estimated as ˆD i = p i,high q i,low + p i,low q i,high. Under the null, E(Ĉi) = E(ˆD i ). Test 3: T 3 = 1 n i (Ĉi ˆD i ).

38 Simulations: Methods to be Compared with We compared our approach with proportional odds models with: X continuous, and H 0 : η = 0. logit[p(y j Z)] = α Y j + Zβ Y + ηx X categorical, and H 0 : η 2 = = η t = 0. logit[p(y j Z)] = α Y j + Zβ Y + η 2 I {X=2} + + η t I {X=t} isotonic proportional odds regression X transformed using restricted cubic splines with three pre-selected knots

39 Data simulation steps: 1 Generate Z N(0, 1). Simulation Setup 2 Generate X (5 levels) using model (2) with α X = ( 1, 0, 1, 2) and β X = 1. 3 Generate Y (4 levels) using proportional odds model logit[p(y j Z)] = α Y j + Zβ Y + η 1 I {X=1} + + η t I {X=t} with α Y = ( 1, 0, 1) and β Y = 0.5. We considered 4 scenarios for η = (η 1,, η t ): 1 η = (0, 0, 0, 0, 0) (the null) 2 η = ( 0.4, 0.2, 0, 0.2, 0.4) (linear) 3 η = ( 0.30, 0.18, 0.20, 0.22, 0.24) (monotonic non-linear) 4 η = ( 0.2, 0, 0.2, 0, 0.2) (non-monotonic) For each scenario, 10,000 data sets, each with 500 subjects.

40 Simulation Results: Type I Error and Power (%) Analysis method Simulation scenarios Null Linear Non-linear Non-monotonic Our method T 1 Empirical Asymptotic X linear X categorical Isotonic Splines

41 Anisometropic Amblyopia Data Analysis Our method, log10(p) OLS, log10(p)

42 Limitations 1 We focus on hypothesis testing, not estimation. 2 We make an assumption of no interaction between X and Z. 3 We make a proportional odds assumption on X over Z. 4 Our method doesn t help if one is interested in the effect of Z on ordinal Y after adjusting for ordinal X.

43 P-value via Empirical Distribution Let T be one of the three test statistics. We simulate replicate data sets under the null. Repeat the following N emp times: 1 Generate one observation from {ˆp j i ˆql i }, (i = 1,, n). 2 Carry out the entire estimating procedure using the newly generated replicate data set to obtain T. The two-sided p-value is then computed as either #( T T )/N emp or 2 min{#(t T), #(T T)}/N emp. This procedure is essentially a parametric bootstrap procedure.

44 M-Estimation Theory Consider parameter vector θ of length p, whose estimate ˆθ is obtained by solving i Ψ(θ) = 0, where Ψ i(θ) = Ψ(Y i, X i, Z i ; θ) is a p-variate function that satisfies E θ [Ψ i (θ)] = 0. From M-estimation theory, if Ψ is suitably smooth, then n(ˆθ θ) d N(0, V(θ)), where V(θ) = A(θ) 1 B(θ)[A(θ) 1 ], A(θ) = E [ θ Ψ i(θ) ], and B(θ) = E[Ψ i (θ)ψ i (θ) ]. If T = g(ˆθ) is a smooth function of ˆθ, then n[g(ˆθ) g(θ)] d N(0, σ 2 ), where σ 2 = [ θ g(θ)] V(θ) [ θ g(θ)]. If g(θ) = 0 under the null, then the p-value is 2Φ ( ) T σ/. n

45 P-value via Estimating Equations For all three statistics, θ = (θ 1, θ 2, θ 3 ), where θ 1 = (α Y, β Y ), θ 2 = (α X, β X ), and θ 3 is different for each statistic. The corresponding estimating function Ψ i (θ) will have the form θ 1 l 1 (Y i, Z i ; θ 1 ) Ψ i (θ) = θ 2 l 2 (X i, Z i ; θ 2 ) ψ(y i, X i, Z i ; θ 3 ), where l 1 and l 2 are the log-likelihood functions of the proportional odds models [(1) and (2), respectively. ] They [ are score functions ] and thus E θ θ 1 l 1 (Y i, Z i ; θ 1 ) = 0 and E θ θ 2 l 2 (X i, Z i ; θ 2 ) = 0. The function ψ(y i, X i, Z i ; θ 3 ) will be different for each statistic.

46 1 Introduction Examples Problems in regression tau and gamma Outline 2 Our Approach Motivation Proportional odds models Notation and model fitting Method 1: Observed versus expected Method 2: Residual-based test statistics Method 3: (Y i, X i ) versus (Y, X) Z i 3 Simulations 4 Example Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,