Loglinear models. STAT 526 Professor Olga Vitek

Size: px

Start display at page:

Download "Loglinear models. STAT 526 Professor Olga Vitek"

Alyson Joseph
5 years ago
Views:

1 Loglinear models STAT 526 Professor Olga Vitek April 19,

2 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1

3 Recall: Poisson Distribution Probability distribution: Y - number of events in a fixed interval of space/time Y P oisson(λ) p(y) = e λ λ y y!, y = 0, 1,...; E(Y ) = var(y ) = λ Y 1, Y 2,..., Y c ind P oisson(λ i ), c i=1 Y i P oisson( c i=1 λ i ) c indep. Poisson r.v. total Multinomial P (Y 1 = n 1,..., Y c = n c = P (Y 1 = n 1,..., Y c = n c ) P ( i Y i = n) Y i = n) i = [ exp( λ i )λ n i i /n i! ] i exp( λ i ) ( λ i ) n /n! = n! n i! i i i i π n i i, π i = λ i λ i i 8-2

4 Models for Joint Distributions of Unordered Categorical Variables 8-3

5 Joint Distributions of Categorical Variables Convenient to model contingency tables two-way, but also more complex tables can express complex probabilistic relationships Treat all categorical variables symmetrically no distinction between predictor and response analogous to correlation of continuous variables Can be thought of as a model for a network of associations between categorical variables related to graphical models Example: modeling functional networks of genes (e.g. activated/non-activated ; substitutions of nucleotides) 8-4

6 2 R.V.: Independence Y ij P oisson(λ ij ) Of interest: effect of row i and column j on Y ij General model (Faraway Ch. 4): log E{Y ij } notation = log λ ij = log n π ij Assuming independence of rows and columns: log E{Y ij } notation = log n π i π j = log n + log π i + log π j = µ + α i + β j, where i e α i = j e β j = 1 ML estimation with Poisson likelihood ˆπ ij = eˆα i eˆβ j = ˆπ i ˆπ j ; ˆλ ij = n ˆπ ij Total number of parameters 1+(I 1)+(J 1) Same ˆλ ij as in X 2 test for independence Can use X 2 and G 2 tests to test goodness of fit 8-5

7 2 R.V.: Independence Alternative parametrization (Agresti Ch.8) log E{Y ij } = µ + α i + β j, α I = β J = 0 µ = log E{Y IJ } α i and β j are deviations of E{Y ij } from the reference cell (I, J) due to row i and column j Parametrization in R α 1 = β 1 = 0 and µ = log E{Y 11 } Will use this parametrization from now on. ML estimation with Poisson likelihood ˆλ ij = eˆµ+ˆα i+ˆβ j equivalent = n i+ n +j /n ˆπ ij Slide 2 = = i λ ij j λ ij = i eˆµ+ˆα i+ˆβ j j eˆµ+ˆα i+ˆβ j = eˆα i i eˆα i eˆβ j j eˆβ j = ˆπ i ˆπ j equivalent = n i+ n +j /n 2 As in ANOVA, all parametrizations produce identical estimates of probabilities and counts 8-6

8 2 R.V.: Saturated model Saturated model log E{Y ij } = µ + α i + β j + αβ ij µ = log E{Y 11 }, α 1 = β 1 = αβ i1 = αβ 1j = 0 Total number of parameters 1 + (I 1) + (J 1) + (I 1)(J 1) = IJ (i.e. describes each cell perfectly) ML estimation with Poisson likelihood Model diagnostics developed for Poisson regression (e.g. residuals) apply Test for independence of rows and columns H 0 : αβ ij = 0 vs H 0 : αβ ij 0 LR (G 2 ) test with (I 1)(J 1) df 8-7

9 3 R.V.: Mutual Indep. 3-way I J K cross-classification of r.v. X, Y and Z Assume the count Y ijk P oisson( E{Y ijk } ) X, Y and Z are mutually independent if π ijk = π i π j π k log E{Y ijk } = log n + log π i + log π j + log π k Total number of parameters 1 + (I 1) + (J 1) + (K 1) The log-linear model is log E{Y ijk } = µ + α i + β j + γ k µ = log E{Y 111 }, α 1 = β 1 = γ 1 = 0 ˆλ ijk = eˆµ+ˆα i+ˆβ j +ˆγ k equivalent = n i++ n +j+ n ++k /n 2 Slide 2 ˆπ ijk = i j eˆµ+ˆα i+ˆβ j +ˆγ k = eˆαi k eˆµ+ˆα i+ˆβ j +ˆγ k i eˆα i eˆβ j j eˆβ j eˆγ k k eˆγ k = ˆπ i ˆπ j ˆπ k equivalent = n i++ n +j+ n ++k /n 3 8-8

10 Example: Female Smoking A survey of women by age, and a follow-up study 20 years later. library(faraway) data(femsmoke) > head(femsmoke) y smoker dead age 1 2 yes yes no yes yes yes no yes yes yes no yes ct3 <- xtabs(y~smoker+dead+age, femsmoke) > ct3,, age = dead smoker yes no yes 2 53 no 1 61,, age = dead smoker yes no yes no

11 Example: Female Smoking Pearson X 2 test of mutual independence > summary(ct3) Call: xtabs(formula=y~smoker+dead+age, data=femsmoke) Number of cases in table: 1314 Number of factors: 3 Test for independence of all factors: Chisq = 790.6, df = 19, p-value = 2.140e-155 Log-linear model with mutual independence > fit1 <- glm(y~smoker+dead+age, femsmoke, family="poisson") Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** smokerno e-05 *** deadno < 2e-16 *** age e-15 *** age e-09 *** age e-07 *** age e-10 *** age ** age ** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 27 degrees of freedom Residual deviance: on 19 degrees of freedom AIC:

12 3 R.V.: Joint Independence Assume the count Y ijk P oisson( E{Y ijk } ) X and Y are dependent, but together they are independent of Z π ijk = π ij π k log E{Y ijk } = log n + log π ij + log π k Total number of parameters 1 + (IJ 1) + (K 1) The log-linear model is log E{Y ijk } = µ + α i + β j + αβ ij + γ k µ = log E{Y 111 }, α 1 = β 1 = γ 1 = αβ 1j = αβ i1 = 0 ˆλ ijk Slide 2 = eˆµ+ˆα i+ˆβ j + αβ ij +ˆγ k equivalent = n ij+ n ++k /n ˆπ ijk = i j eˆµ+ˆα i+ˆβ j + αβ ij +ˆγ k = k eˆµ+ˆα i+ˆβ j + αβ ij +ˆγ k i eˆα i+ˆβ j + αβ ij j eˆα i+ˆβ j + αβ ij eˆγ k k eˆγ k = ˆπ ij ˆπ k equivalent = n ij+ n ++k /n

13 Example: Female Smoking Model joint independence of smoker and dead from age (i.e. smoker and dead are dependent, but jointly independent of age). Only a minor improvement of model fit > fit2 <- glm(y~smoker*dead+age, femsmoke, family="poisson") Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** smokerno e-06 *** deadno < 2e-16 *** age e-15 *** age e-09 *** age e-07 *** age e-10 *** age ** age ** smokerno:deadno ** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 27 degrees of freedom Residual deviance: on 18 degrees of freedom AIC:

14 3 R.V.: Conditional Indep. P {X = i} and P {Y = j} are independent, given Z = k π ij k = π i k π j k weaker than mutual or joint independence The joint probability is then π ijk = π i k π j k π k = π ik π πjk k π π k k = π ikπ jk π k log E{Y ijk } = log n + log π ik + log π jk log π k The log-linear model is log E{Y ijk } = µ + α i + β j + γ k + αγ ik + βγ jk µ = log E{Y 111 }, α 1 = β 1 = γ 1 = αγ 1k = αγ i1 = βγ 1k = βγ j1 = 0 ˆπ ijk = = i j eˆµ+ˆα i+ˆβ j +ˆγ k + αγ ik + βγ jk k eˆµ+ˆα i+ˆβ j +ˆγ k + αγ ik + βγ jk eˆα i+ˆγ k + αγ ik i k eˆα i+ˆγ k + αγ ik j eˆβ j +ˆγ k + βγ jk k eˆβ j +ˆγ k + βγ jk k ˆγ k e ˆγ k 8-13

15 Example: Female Smoking Conditional indep. of (smoke, dead age) > fit3<-glm(y~smoker*age+dead*age,femsmoke,family="poisson") Estimate Std. Error z value Pr(> z ) (Intercept) smokerno age age ** age e-06 *** age e-09 *** age e-07 *** age *** deadno e-10 *** smokerno:age smokerno:age smokerno:age ** smokerno:age smokerno:age e-05 *** smokerno:age e-05 *** age25-34:deadno age35-44:deadno * age45-54:deadno *** age55-64:deadno e-07 *** age65-74:deadno e-16 *** age75+:deadno (Dispersion parameter for poisson family taken to be 1) Null deviance: on 27 degrees of freedom Residual deviance: on 7 degrees of freedom AIC:

16 3 R.V.: Uniform (Homogeneous) Association For each level of one variable, same association of the other two variables π ijk = π ij π jk π ik log E{Y ijk } = log n + log π ij + log π jk + log π ik The log-linear model is log E{Y ijk } = µ + α i + β j + γ k + αγ ik + βγ jk + αβ ij µ = log E{Y 111 }, α 1 = β 1 = γ k = αγ 1k = αγ i1 = βγ 1k = βγ j1 = αβ 1j + αβ i1 = 0 Not a saturated model, since no 3-way interaction ˆπ ijk = i j eˆµ+ˆα i+ˆβ j +ˆγ k + αγ ik + βγ jk + αβ ij k eˆµ+ˆα i+ˆβ j +ˆγ k + αγ ik + βγ jk + αβ ij = eˆα i+ˆγ k + αγ ik i k eˆα i+ˆγ k + αγ ik j eˆβ j +ˆγ k + βγ jk k eˆβ j +ˆγ k + βγ jk i eˆα i+ˆβ j + αβ ij j eˆα i+ˆβ j + αβ ij = ˆπ ij ˆπ jk ˆπ ik 8-15

17 Interpretation of Uniform Association Constant odds ratios between the levels of two variables, for each level of the third variable e.g. for i = 1, 2, j = 1, 2 and a given level k: log OR = log λ 11k λ 22k λ 12k λ 21k = αβ 11 + αβ 22 αβ 12 αβ 21 independent of k No easy way to estimate ˆλ ijk and ˆπ ijk based on cell counts 8-16

18 Example: Female Smoking > fit4 <- glm(y~smoker+age+dead+smoker:age+ smoker:dead+dead:age,femsmoke,family="poisson") Estimate Std. Error z value Pr(> z ) (Intercept) smokerno age age ** age e-06 *** age e-08 *** age e-06 *** age ** deadno e-09 *** smokerno:age smokerno:age smokerno:age * smokerno:age smokerno:age e-07 *** smokerno:age e-06 *** smokerno:deadno * age25-34:deadno age35-44:deadno * age45-54:deadno *** age55-64:deadno e-07 *** age65-74:deadno < 2e-16 *** age75+:deadno (Dispersion parameter for poisson family taken to be 1) Null deviance: on 27 degrees of freedom Residual deviance: on 6 degrees of freedom AIC:

19 3 R.V.: ML Estimation Joint Poisson probability of cell counts Y ijk i j k e λ ijk λ n ijk ijk n ijk! The log-likelihood l(µ) = i j k n ijk log λ ijk i j k λ ijk + C For the model with joint independence l(λ) = nµ + n i++ α i + n +j+ β j + i j k e µ+α i+β j +γ k + C i j k n ++k γ k n i++, n +j+, n ++k are sufficient statistics parameters are estimated in these terms 8-18

20 Loglinear Models Summary Y ijk - count in cell (i, j, k); Y ijk P oisson(λ ijk ) Conditional on n = ijk n ijk, Y ijk Multinom(π ijk ). µ ijk = log E{Y 111 } - reference cell α i, β j, γ k - deviations of log E{Y ijk } from reference; α 1 = β 1 = γ 1 = 0. Residual Df = IJK # model params Model log E{Y ijk } = Mut. Indep µ + α i + β j + γ k Joint Indep. µ + α i + β j + γ k + (αβ) ij Cond. Indep. µ + α i + β j + γ k + (αγ) ik + (βγ) jk Unif. Assoc. µ + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk Saturated µ + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk + (αβγ) ijk Model π ijk = ˆλ ijk = Mut. Indep π i π j π k n i++ n +j+ n ++k /n 2 Joint Indep. π ij π k n ij+ n ++k /n Cond. Indep. π ik π jk /π k n i+k n +jk /n ++k Unif. Assoc. π ij π ik π jk Iterative Saturated π ijk n ijk 8-19

21 Models for Joint Distributions of Ordered Categorical Variables 8-20

22 2 R.V.: Linear-by-Linear Association Y ij P oisson(λ ij ) Of interest: effect of row i and column j on Y ij Assign scores u i to rows, u 1 u 2... u I Assign scores v j to columns, v 1 v 2... v J Log-linear model: log E{Y ij } notation = log λ ij = log n π ij = log n + α i + β j + γu i v j The log-linear model: log E{Y ij } = µ + α i + β j + γu i v j, where α 1 = β 1 = 0 µ = log E{Y 11 } γu 1 v 1, (= log E{Y 11 } when u 1 = v 1 coded as = 0) γ - quantifies (positive or negative) association Check sensitivity of conclusions to score coding 8-21

23 Interpretation of Linear-by-Linear Association Constant log-odds ratios for equally spaced scores e.g. for adjacent entries in both rows and columns of the table: log OR = log λ ij λ i+1, j+1 λ i, j+1 λ i+1, j = γ(u i+1 u i )(v j+1 v j ) same for non-adjacent equally-spaced scores 8-22

24 Example: Vote by Education Voting preference in 1996 pres. election > library(faraway) > data(nes96) > xtabs(~pid+educ, nes96) educ PID MS HSdrop HS Coll CCdeg BAdeg MAdeg strdem weakdem inddem indind indrep weakrep strrep Introduce scores u i and v i partyed$opid <- unclass(partyed$pid) partyed$oeduc <- unclass(partyed$educ) > partyed PID educ Freq opid oeduc 1 strdem MS weakdem MS inddem MS indind MS indrep MS weakrep MS

25 Example: Vote by Education Fit additive model (i.e. independence) Ignore potential order in the categories > fit5 <- glm(freq ~ PID + educ, partyed, family=poisson) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) *** PIDweakDem PIDindDem e-07 *** PIDindind < 2e-16 *** PIDindRep e-09 *** PIDweakRep ** PIDstrRep educhsdrop e-06 *** educhs < 2e-16 *** educcoll < 2e-16 *** educccdeg e-11 *** educbadeg < 2e-16 *** educmadeg e-15 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 48 degrees of freedom Residual deviance: on 36 degrees of freedom AIC:

26 Example: Vote by Education Incorporate the quantitative scores > fit6 <- glm(freq~pid+educ+i(opid*oeduc), partyed,family=poisson) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-05 *** PIDweakDem * PIDindDem e-09 *** PIDindind < 2e-16 *** PIDindRep e-10 *** PIDweakRep e-05 *** PIDstrRep *** educhsdrop e-05 *** educhs < 2e-16 *** educcoll e-15 *** educccdeg e-06 *** educbadeg e-12 *** educmadeg e-06 *** I(oPID * oeduc) ** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 48 degrees of freedom Residual deviance: on 35 degrees of freedom AIC:

27 2 R.V.: Column-Effect Model Y ij P oisson(λ ij ) Ordinal-by-nominal model (i.e. columns are not assigned scores) Treat education as nominal, but political preference as ordinal with scores u i, u 1... u I Log-linear model: log E{Y ij } notation = log λ ij = log n π ij = log n + α i + β j + γ j u i The log-linear model: log E{Y ij } = µ + α i + β j + γ j u i, where α 1 = β 1 = γ J = 0 µ = log E{Y 11 } γ 1 u 1, (= log E{Y 11 } when u 1 coded as = 0) γ j - separate parameter of u i for each column ˆγ j roughly equally spaced monotone if linear-by-linear model is appropriate 8-26

28 Constraints on γ j in Column-Effect Model An example of a 2 3 table: [ ] Y11 Y 12 Y 13 Y 21 Y 22 Y 23 In matrix form, setting u 1 = 1 and u 2 = 2: log E Y 11 Y 12 Y 13 Y 21 Y 22 Y 23 = µ α 2 β 2 β 3 γ 1 γ 2 α 1 and β 1 are constrained to avoid multicollinearity with intercept γ 3 is constrained to avoid multicollinearity with columns corresponding to µ and α2 constraint on γ J ensures the same interpretation of the intercept? 8-27

29 Example: Vote by Education Column-Effect Model > fit7<-glm(freq~pid+educ+educ:opid,partyed,family=poisson) Coefficients: (1 not defined because of singularities) Estimate Std. Error z value Pr(> z ) (Intercept) e-05 *** PIDweakDem PIDindDem e-05 *** PIDindind e-14 *** PIDindRep *** PIDweakRep PIDstrRep educhsdrop * educhs e-06 *** educcoll *** educccdeg educbadeg *** educmadeg * educms:opid * educhsdrop:opid * educhs:opid educcoll:opid educccdeg:opid educbadeg:opid educmadeg:opid NA NA NA NA Null deviance: on 48 degrees of freedom Residual deviance: on 30 degrees of freedom AIC:

30 Ordinal Models Summary Y ij - count in row i, column j; Y ij P oisson(λ ij ) Conditional on i Nominal Categories α i, β j - row and column effects, j n ij, Y ij Multinomial(π ij ). I i=1 α i = J j=1 β j = I i=1 (αβ) ij = J j=1 (αβ) ij = 0 Ordinal Categories u i, v j - continuous scores of rows/columns γ or γ i, γ j - parameters, I i=1 γ i = J j=1 γ j = 0. Model log E{Y ij } = Residual Df Independence µ + α i + β j (I 1)(J 1) Linear-by-linear µ + α i + β j + γ(u i v j ) (I 1)(J 1) 1 Row-effect µ + α i + β j + γ i v j (I 1)(J 2) Column-effect µ + α i + β j + γ j u i (I-2)(J-1) Saturated µ + α i + β j + (αβ) ij

31 Multinomial Response as Function of Predictors: Surrogate Log Linear Models 8-30

32 Example: Housing Satisfaction of householders with housing (Venable and Ripley Sec. 7.3) > head(housing) Sat Infl Type Cont Freq 1 Low Low Tower Low 21 2 Medium Low Tower Low 21 3 High Low Tower Low 28 4 Low Medium Tower Low 34 5 Medium Medium Tower Low 22 6 High Medium Tower Low > xtabs(freq~infl+type+cont, data=housing),, Cont = Low Type Infl Tower Apartment Atrium Terrace Low Medium High ,, Cont = High Type Infl Tower Apartment Atrium Terrace Low Medium High

33 Null Model Model Sat as multinomial response; Influence, Contact and Type as predictors A different offset per covariate pattern Offset = 3-way interaction of all predictors Same probabilities or response categories across all covariate patterns log E{Y ijkl } = µ ijk + δ l = [ µ + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk + (αβγ) ijk ] + δl Constraints on α, β, γ, δ and on the interactions > fit <- glm(freq ~ Infl*Type*Cont + Sat, family=poisson, data=housing) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 3.136e e < 2e-16 *** InflMedium 2.733e e InflHigh e e TypeApartment 3.666e e * TypeAtrium e e *** TypeTerrace e e ***... (Dispersion parameter for poisson family taken to be 1) Null deviance: on 71 degrees of freedom Residual deviance: on 46 degrees of freedom 8-32

34 Additive Contribution of Individual Predictors Testing whether Sat depends on each of the 3 predictors individually A different offset per covariate pattern The probabilities of response categories are affected by one predictor across all covariate patterns log E{Y ijkl } = µ ijk + δ l + (αδ) il, or log E{Y ijkl } = µ ijk + δ l + (βδ) jl, or log E{Y ijkl } = µ ijk + δ l + (γδ) kl Constraints on α, β, γ, δ and on the interactions > addterm(fit, ~. + Sat:(Infl+Type+Cont), test="chisq") Single term additions Model: Freq ~ Infl * Type * Cont + Sat Df Deviance AIC LRT Pr(Chi) <none> Infl:Sat < 2.2e-16 *** Type:Sat e-11 *** Cont:Sat Infl: max reduction in resid. deviance & AIC 8-33

35 Additive Contributions of All Predictors Add main effects of all predictors A different offset per covariate pattern Same effect of each predictor on probabilities of response categories, regardless of the value of the other predictors Constraints on α, β, γ, δ and on the interactions log E{Y ijkl } = µ ijk + δ l + (αδ) il + (βδ) jl + (γδ) kl fit1 <- update(fit,.~. + Sat:(Infl+Type+Cont)) Residual deviance: on 34 degrees of freedom Add higher-order interactions to represent non-additive effects of predictors on Sat log E{Y ijkl } = µ ijk + δ l + (αδ) il + (βδ) jl + (γδ) kl (αβδ) ijl + (αγδ) ikl + (βγδ) jkl addterm(fit1,.~.+sat:(infl+type+cont)^2, test="chisq") None significant Next analysis steps: plot predicted probabilities and counts; analysis of residuals 8-34

36 Compare to Multinom. Reg. Multinomial regression: > library(nnet) > fit.multinom <- multinom(sat ~ Infl + Type + Cont, weights=freq, data=housing) Coefficients: (Intercept) InflMedium InflHigh TypeApartment Medium High Residual Deviance: Different deviances due to different saturated models. In multinom the saturated model models subjects; in surrogate linear model it models covariate pattern Can compare the predicted probabilities p1 <- predict(fit.multinom, type="probs") Saturated model with multinom fit.saturated <- multinom(sat ~ Infl * Type * Cont, weights=freq, data=housing) anova(fit.multinom, fit.saturated) LR test statistic = residual deviance in surrogate linear model. 8-35

37 Models with Poisson Likelihood: Summary Model E{response} as function of predictors Count response: Poisson regression Count response with overdispersion: Quasipoisson or Negative Binomial regression Multinomial response: Surrogate log linear models Multivariate associations of categorical variables in contingency tables Nominal random variables: Loglinear models Ordinal random variables: Linear-by-linear model Row-effect and column-effect models 8-36

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 7 Contingency Table 0-0 Outline Introduction to Contingency Tables Testing Independence in Two-Way Contingency Tables Modeling Ordinal Associations