STAT 526 Advanced Statistical Methodology
|
|
- Hector Pope
- 6 years ago
- Views:
Transcription
1 STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 7 Contingency Table 0-0
2 Outline Introduction to Contingency Tables Testing Independence in Two-Way Contingency Tables Modeling Ordinal Associations Correspondence Analysis Models for Matched Pairs Three-Way Contingency Tables Dabao Zhang Page 1
3 Introduction to Contingency Tables Contingency Table: is a table with cells containing frequency counts of outcomes which are classified according to certain variables (Karl Pearson, 1904). Contingency tables are used to display relationships between categorical variables. Two-Way Table: can be used to study the relationships between two categorical variables, e.g., X and Y. Suppose that X has I categories, and Y has J categories. Classifications of subjects on both variables have I J possible combinations, i.e., I J cells in a rectangular table having I rows for categories of X and J columns for categories of Y. A contingency table with I rows and J columns is called an I J (I-by-J) table. Example: Cross-Classification of Smoking by Lung Cancer Lung Cancer Smoking Cases Controls Total Yes No Total Dabao Zhang Page 2
4 Three-Way Table: can be used to study the relationships between three categorical variables, e.g., X, Y and Z. Suppose that X has I categories, Y has J categories, and Z has K categories. Classifications of subjects on all possible combinations present an I J K contingency table. Example: Alcohol, Cigarette, and Marijuana Use for High School Seniors Alcohol Cigarette Marijuana Use Use Use Yes No Yes Yes No No Yes 3 43 No Dabao Zhang Page 3
5 Testing Independence in Two-Way Contingency Tables Multinomial Sampling When the total sample size n is fixed but the row and column totals are not, a multinomial sampling model applies. Usually both X and Y are response variables, so the joint distribution is used to describe their association. P(X = i,y = j) = π ij, i = 1,,I; j = 1,,J Let n ij be the count in cell (i,j), then the probability mass function of the cell counts is n! n 11! n IJ! I i J j=1 π n ij ij = l = Independence of Categorical Variables I i=1 J j=1 n ij log(π ij ) + constant X and Y are independent π ij = π i+ π +j, i = 1,,I,j = 1,,J Marginal distributions: P(X = i) = π i+, P(Y = j) = π +j, where the subscript + denotes the sum over that index. Dabao Zhang Page 4
6 Hypothesis test H 0 : π ij = π i+ π +j, for all i and j H a : π ij π i+ π +j, for some i and j Under the full model, the MLE of π ij is ˆπ ij = n ij /n ++ Under the null model, the MLEs are ˆπ i+ = n i+ /n ++ and ˆπ +j = n +j /n ++. The LRT (or deviance-based test) is 2 I i=1 J j=1 n ij log n ijn ++ n i+ n +j asy. χ 2 (I 1)(J 1), under H 0 Example: Cross-Classification of Smoking by Lung Cancer (Continued) > y <- c(688,21,650,59); > smoke <- gl(2,1,4,labels=c("yes","no")); #gl: generate a factor with given levels > cancer <- gl(2,2,labels=c("cases","controls")); > lcancer <- data.frame(y,cancer,smoke); lcancer; y cancer smoke cases yes 2 21 cases no controls yes 4 59 controls no Dabao Zhang Page 5
7 >lcct <- xtabs(y~smoke+cancer); #xtabs: create a contingency table cancer smoke cases controls yes no > (fpi <- prop.table(xtabs(y~smoke+cancer))) cancer smoke cases controls yes no > spi <- prop.table(xtabs(y~smoke)); > cpi <- prop.table(xtabs(y~cancer)); > (npi <- outer(spi,cpi)) cancer smoke cases controls yes no > pchisq(2*sum(lcct*log(fpi/npi)),1,lower=f) [1] e-06 Conclusion? Dabao Zhang Page 6
8 2 2 Table Independence of X and Y can be stated in terms of the odds ratio X and Y are independent θ = π 11/π 12 π 21 /π 22 = π 11π 22 π 12 π 21 = 1 This is because (similarly for other π ij ), when θ = 1, π 12 = π +1 π 12 + π +2 π 12 = (π 11 π 12 + π 21 π 12 ) + π +2 π 12 = (π 11 π 12 + π 11 π 22 ) + π +2 π 12 = π +2 π 11 + π +2 π 12 = π 1+ π +2 MLE of the above odds ratio ˆθ = n 11/n 12 n 12 /n 22 = n 11n 22 n 12 n 21 Asymptotically, log(ˆθ) N(log(θ), ˆσ 2 ), where ˆσ 2 = 1 n n n n 22 When some n ij = 0, ˆθ is not a good estimator. It is amended by adding 0.5 to each cell count, θ = (n )(n ) (n )(n ) Dabao Zhang Page 7
9 Example: Cross-Classification of Smoking by Lung Cancer (Continued) > (etheta <- lcct[1,1]*lcct[2,2]/(lcct[1,2]*lcct[2,1])) [1] > (sele <- sqrt(sum(1/lcct))) [1] > log(etheta)+sele*c(-1.96,1.96) [1] Conclusion? Dabao Zhang Page 8
10 Independent (or Product) Multinomial Sampling When the row totals, i.e., n i+, i = 1,,I, are fixed, a independent multinomial sampling model applies. Usually X is an explanatory variable, and observations on a response Y occur separately at each setting of X. So the conditional distribution is used to describe their association P(Y = j X = i) = π j i, i = 1,,I; j = 1,,J Let n ij be the count in cell (i,j), then the counts {n ij,j = 1,,J} satisfying J j=1 n ij = n i+ follow a multinomial distribution n i+! n i1! n ij! J j=1 π n ij j i Independence of Categorical Variables X and Y are independent π j 1 = = π j I, j = 1,,J Independence is then often referred to as homogeneity of the conditional distributions. Dabao Zhang Page 9
11 π ij = π i+ π +j for all i and j π j 1 = = π j I for all j π j i = π ij /π i+ = (π i+ π +j )/π i+ = π +j Let π j i = c j, then π +j = I π ij = J π i+ c j = c j = π ij = π i+ π +j i=1 i=1 Q: How to test the homogeneity of the conditional distributions? Column Row 1 J Total 1 π 11 (π 1 1 ) π 1J (π J 1 ) π I π I1 (π 1 I ) π IJ (π J I ) π I+ Total π +1 π +J π ++ Consider the new notation: π j (x) = P(Y = j X = x) = Consider a model for multinomial responses! Dabao Zhang Page 10
12 Example: Cross-Classification of Smoking by Lung Cancer (Continued) > (mnlc <- matrix(y,nrow=2)) [,1] [,2] [1,] [2,] > mnmod <- glm(mnlc~1,family=binomial); > deviance(mnmod) [1] > 2*sum(lcct*log(fpi/npi)) # deviance in the multinomial sampling [1] Conclusion? Dabao Zhang Page 11
13 Poisson Sampling Denote the count of cell (i,j) as Y ij A Poisson sampling model assumes each Y ij follows an independent Poisson distribution with rate {µ ij } ( I J I ) J Y ij ind. Poisson(µ ij ) = Y ij Poission µ ij i=1 j=1 i=1 j=1 Denote I J i=1 J j=1 µ ij = µ ++ I Given Y ij = n ++, (Y 11,,Y ij,,y IJ ) follows a multinomial i=1 j=1 distribution with E[Y ij n ++ ] = n ++ π ij, π ij = µ ij /µ ++. Independence of X and Y has the following form log(µ ij ) = λ + α i + β j The above model is called the loglinear model of independence for two-way contingency tables, whereby the log expected frequency is an additive function of a row effect α i and a column effect β j. An independence test is also a goodness-of-fit test of the above loglinear model. Dabao Zhang Page 12
14 Example: Cross-Classification of Smoking by Lung Cancer (Continued) > pmod <- glm(y~smoke+cancer,family=poisson); > summary(pmod)... Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 6.506e e <2e-16 *** smokeno e e <2e-16 *** cancercontrols e e e Null deviance: on 3 degrees of freedom Residual deviance: on 1 degrees of freedom AIC: An alternative to the deviance-based test is the Pearson s X 2 test X 2 = I i=1 J j=1 > emu <- npi*sum(lcct); sum((lcct-emu)^2/emu) [1] (y ij ˆµ ij ) 2 Yates continuity correction: +0.5 if y ij µ ij > 0; 0.5 if y ij µ ij < 0 The deviance-based test is preferred to the Pearson s X 2. ˆµ ij Dabao Zhang Page 13
15 Hypergeometric Sampling When both row and column margins are fixed, the appropriate sampling distribution is the hypergeometric. This situation is less common in practice. When X and Y are independent, {n ij }, given the row and column margins, follows the following hypergeometric distribution ( I i=1 n i+! n ++! I )( J i=1 J j=1 n ij! j=1 n +j! An exact test of independence can be developed by defining a table order For a 2 2 table, the hypergeometric distribution is P(n 11 = k) = ( n 1+ k ( )( ) n 2+ ) n +1 k ), n ++ n +1 max(0,n 1+ + n +1 n) k min(n 1+,n +1 ) Fisher s exact test: p-value equals to the total probability of all outcomes more extreme than the one observed. Dabao Zhang Page 14
16 Example: Cross-Classification of Smoking by Lung Cancer (Continued) > fisher.test(lcct) Fisher s Exact Test for Count Data data: lcct p-value = 1.476e-05 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio > etheta*exp(sele*c(-1.96,1.96)) # CI based on asymptotic approximation [1] Dabao Zhang Page 15
17 Modeling Ordinal Associations Treating ordered categories as nominal categories ignores important information. Example: US 1996 National Election Study (Continued) Here we consider the association between the party identification and level of education. > data(nes96); xtabs(~pid+educ,nes96); educ PID MS HSdrop HS Coll CCdeg BAdeg MAdeg strdem weakdem inddem indind indrep weakrep strrep > (partyed <- as.data.frame.table(xtabs(~pid+educ,nes96))) #convert to data.frame PID educ Freq 1 strdem MS strrep MAdeg 25 Dabao Zhang Page 16
18 > nomod <- glm(freq~pid+educ,family=poisson,data=partyed); > pchisq(deviance(nomod),df.residual(nomod),lower=f) [1] When treat both variables as nominal, we have no evidence against independence. > presid <- residuals(nomod,type="pearson"); > xtabs(presid~partyed$pid+partyed$educ); partyed$educ partyed$pid MS HSdrop HS Coll CCdeg BAdeg MAdeg strdem weakdem inddem indind indrep weakrep strrep Cross-classifications of ordinal variabls often exhibit their greatest deviations from independence in the corner cells Sample counts are much larger than independence predicts when both responses are at the lowest order or the highest order. The counts are much smaller than fitted values where one response is at the highest order and the other is at the lowest order. The above residuals table indicates lack of fit in the form of a positive trend. Subjects who have higher level of education also tend to be stronger republican. Dabao Zhang Page 17
19 Linear-by-Linear Association in Two-Way Tables Assigning the following row scores and column scores, respectively, u 1 u 2 u I, v 1 v 2 v J A simple model for these two ordinal variables is the linear-by-linear association model (L L) log(µ ij ) = λ + α i + β j + γu i v j, γu i v j represents the deviation of log(µ ij ) from independence The deviation is linear in the Y scores at a fixed level of X, and linear in the X scores at a fixed level of Y. So it is called the L L model. The model has its greatest departures from independence in the corners of the table. γ = 0 implies independence of X and Y When γ > 0, Y tends to increase as X increase. When γ < 0, Y tends to decrease as X increase. Dabao Zhang Page 18
20 Example: US 1996 National Election Study (Continued) We assign evenly spaced scores, i.e., one to seven (you can also try other scores), for both PID and educ, and fit the L L model > partyed$opid <- unclass(partyed$pid); partyed$oeduc <- unclass(partyed$educ); > lblmod <- glm(freq~pid+educ+i(opid*oeduc),family=poisson,data=partyed); > summary(lblmod)... Coefficients: Estimate Std. Error z value Pr(> z )... I(oPID * oeduc) **... Null deviance: on 48 degrees of freedom Residual deviance: on 35 degrees of freedom AIC: > anova(nomod,lblmod,test="chi"); Analysis of Deviance Table Model 1: Freq ~ PID + educ Model 2: Freq ~ PID + educ + I(oPID * oeduc) Resid. Df Resid. Dev Df Deviance P(> Chi ) Dabao Zhang Page 19
21 Interpretation of γ The log odd ratio for a subtable which have cells adjacent in both rows and columns, e.g., cells (i,j), (i,j + 1), (i + 1,j), and (i + 1,j + 1) log µ ijµ i+1,j+1 µ i,j+1 µ i+1,j = γ(u i+1 u i )(v j+1 v j ) This log odds ratio is stronger as γ increases and for pairs of categories that are farther apart. For evenly spaced socres, these odds ratios are all equal. For instance, when {u i = i} and {v j = j}, we have the constant local odds ratios θ ij = π ijπ i+1,j+1 π i,j+1 π i+1,j = e γ The case of having constant local odds ratios was called as uniform association by Goodman (1979). As a Baseline-Category Logit Model: log π j i π 1 i = log µ ij µ i1 = (β j β 1 ) + γ(v j v 1 )u i We may fit a baseline-category logit model and expect the coefficients of {u i } to be {γ(v i v 1 )}. Dabao Zhang Page 20
22 Example: US 1996 National Election Study (Continued) > nes96$oeduc <- unclass(nes96$educ); > nes96.mn1 <- multinom(pid~oeduc,data=nes96); summary(nes96.mn1);... Coefficients: (Intercept) oeduc weakdem inddem indind indrep weakrep strrep > nes96$opid <- unclass(nes96$pid); > nes96.mn2 <- multinom(educ~opid,data=nes96); summary(nes96.mn2);... Coefficients: (Intercept) opid HSdrop HS Coll CCdeg BAdeg MAdeg As L L models, we expect monotonically increasing (or decreasing) coefficients of oeduc and opid. While the coefficients of opid are more or less increasing, the coefficients of oeduc are apparently not. We may only treat PID as ordinal but educ as nominal. Dabao Zhang Page 21
23 Column Effects Model The columns are not assigned scores as Y is considered a nominal variable. {γ j } are called the column effects. log(µ ij ) = λ + α i + β j + γ j u i The zero-sum constraint is I i=1 α i = J j=1 β j = J j=1 γ j = 0 The baseline constraint is α 1 = β 1 = γ 1 = 0 γ 1 = γ 2 = = γ J implies independence of X and Y As a Baseline-Category Logit Model: log π j i π 1 i = log µ ij µ i1 = (β j β 1 ) + (γ j γ 1 )u i A row effects model is effectively the same model except the roles of the variables reversed. Dabao Zhang Page 22
24 Example: US 1996 National Election Study (Continued) > cmod <- glm(freq~pid+educ+educ:opid,family=poisson,data=partyed); > mcoeff <- summary(cmod)$coeff; > mcoeff[8:13,1] #beta_j-beta_1 educhsdrop educhs educcoll educccdeg educbadeg educmadeg > mcoeff[15:19,1]-mcoeff[14,1] #gamma_j-gamma_1 educhsdrop:opid educhs:opid educcoll:opid educccdeg:opid educbadeg:opid > -mcoeff[14,1] #gamma_j-gamma_1 as gamma_j=0 [1] Similar values to the coefficients in nes96.mn2 > anova(nomod,cmod,test="chi") Analysis of Deviance Table Model 1: Freq ~ PID + educ Model 2: Freq ~ PID + educ + educ:opid Resid. Df Resid. Dev Df Deviance P(> Chi ) The above comparison of cmod to the independence model nomod implies that the column effects model is preferred. What about comparing lblmod and cmod? Dabao Zhang Page 23
25 Correspondence Analysis Correspondence analysis is a graphical way to represent associations in two-way contingency tables. It is very helpful in understanding the dependence between a category of X and a category of Y. This method is based on the Pearson residuals R I J = (r ij ) I J r ij is the Pearson residual for the cell (i,j) Perform the singular value decomposition R I J = U I w D w w V T J w = r ij = w u ik d k v jk k=1 w = min(i,j) U I w = (u ij ) I w and V J w = (u ij ) J w have orthogonal column vectors and called the right and left singular vectors, respectively D = diagonal{d 1,,d w } with d 1 d 2 d w, which are called singular values. w i=1 d2 i = Pearson s X 2 is called the inertia. Dabao Zhang Page 24
26 Usually d d2 2 take account most of w i=1 d2 i = X2. Therefore, u i1 d 1 v j1 + u i2 d 2 v j2 will account for most of the Pearson residual r ij, i.e., r ij u i1 d 1 v j1 + u i2 d 2 v j2. Denote, for k = 1,2, U k = d k u 1k., V k = d k v 1k. u Ik v Jk The two-dimensional correspondence plot displays U 2 against U 1, and V 2 against V 1 on the same graph. Plotting U 2 vs. U 1 shows influence on residuals when ignoring row effect. Large U i indicates the peculiarity of the row i profile. Plotting V 2 vs. V 1 shows influence on residuals when ignoring column effect. Large V i indicates the peculiarity of the column i profile. If a row level and a column level appear close together on the plot and far from the origin, there will be a large positive residual associated with this particular combination indicating a strong positive association. If a row level and a column level are situated diametrically apart on either side of the origin, we may expect a large negative residual indicating a strong negative association. If points representing two rows or two column levels are close together, this indicates that the two levels will have a similar pattern of association. In some cases, one might consider combining the two levels. Dabao Zhang Page 25
27 Example: Hair and Eye Color Data collected from 592 students in an introductory statistics class by counting the numbers of students with given hair/eye combinations. > library(faraway); data(haireye); (ct <- xtabs(y~hair+eye,haireye)); eye hair green hazel blue brown BLACK BROWN RED BLOND > modc <- glm(y~hair+eye,family=poisson,data=haireye); > pchisq(modc$deviance,modc$df.resid,0.95,lower=f) [1] e-25 The above GOF test shows that hair and eye color are not independent. > z <- xtabs(residuals(modc,type="pearson")~hair+eye,data=haireye); > svdz <- svd(z,2,2); > leftsv <- svdz$u %*% diag(sqrt(svdz$d[1:2])); > rightsv <- svdz$v %*% diag(sqrt(svdz$d[1:2])); > bd <- 1.1*max(abs(rightsv),abs(leftsv)); Dabao Zhang Page 26
28 > plot(rbind(leftsv,rightsv),asp=1,xlim=c(-bd,bd),ylim=c(-bd,bd),xlab="sv1", ylab="sv2",type="n") > abline(h=0,v=0); > text(leftsv,dimnames(z)[[1]]); text(rightsv,dimnames(z)[[2]]); SV BLACK brown BROWN hazel RED green blue BLOND SV1 BLOND is far from the origin, indicating that the distribution of eye colors within this group of people is not typical. In contrast, BROWN is close to the origin, indicating an eye color distribution that is close to the overall average. blue and BLOND occur close together on the plot and far from the origin, indicating a strong association between blue eyes and blond hairs. On the other hand, there are relative fewer people with BLOND hairs and brown eyes than would be expected under independence. hazel and green are close together, indicating people with hazel or green eyes have similar hair color distributions and we might choose to combine these two categories. Dabao Zhang Page 27
29 Models for Matched Pairs Matched-pairs data: occur in studies to compare categorical responses for two samples when each observation in one sample pairs with an observation in the other. repeated measurement of subjects, such as longitudinal studies that observe subjects over time. a square two-way contingency table with the same row and column categories summarizes the data. Example: Rating Performance of the Prime Minister For a poll of a random sample of 1600 voting-age British citizens, 944 indicated approval of the Prime Minister s performance in office. Six months later, of these same 1600 people, 880 indicated approval. A strong association exists between opinions six months apart as the sample odds ratio being ( )/(150 86) = (Q: confidence interval?) First Second Survey Survey Approve Disapprove Total Approve Disapprove Total Dabao Zhang Page 28
30 Example: Grading of Eye Pairs for Distance Vision A sample of women are rated for the performance of distance vision in each eye. > library(faraway); data(eyegrade); > (ct<-xtabs(y~right+left,eyegrade)) left right best second third worst best second third worst > summary(ct) Call: xtabs(formula = y ~ right + left, data = eyegrade) Number of cases in table: 7477 Number of factors: 2 Test for independence of all factors: Chisq = 8097, df = 9, p-value = 0 It is not surprising to find strong evidence against independence. A more interesting hypothesis for matched pair data is whether π ij = π ji for all i and j. Dabao Zhang Page 29
31 An I I distribution {π ij } satisfies symmetry if π ij = π ji, i = 1,,I;j = 1,,J(J = I) J Under symmetry, π i+ = π J ij = π ji = π +i, which implies marginal j=1 j=1 homogeneity. For I = 2, symmetry is equivalent to marginal homogeneity For I > 2, marginal homogeneity can occur without symmetry Symmetry as Logit Models log π ij π ji = 0, for all i < j MLE of π ij = π ji is ˆπ ij = ˆπ ji = n ij +n ji 2n ++, and the LRT is 2 n ij log i j 2n ij n ij + n ji asy χ 2 I(I 1)/2, under H 0 Dabao Zhang Page 30
32 Symmetry as Loglinear Models log(µ ij ) = λ + α i + α j + γ ij Symmetry = π i+ = π +i γ ij = γ ji µ ij = µ ji MLE of µ ij = µ ji is ˆµ ij = ˆµ ji = (n ij + n ji )/2, and the LRT is 2 i j n ij log 2n ij n ij + n ji asy χ 2 I(I 1)/2, under H 0 Equivalent to the goodness-of-fit test for a loglinear model with properly defined dummy variables! Bowker s Test of symmetry (Bowker, 1948) X 2 = I 1 i=1 I j=i+1 (n ij n ji ) 2 n ij + n ji asy. χ 2 I(I 1)/2, under H 0 When I = J = 2, the above test is called McNemar s test (McNemar, 1947). Dabao Zhang Page 31
33 Example: Grading of Eye Pairs for Distance Vision (Continued) > mct <- matrix(ct,nrow=4); > 2*sum(mct*log(2*mct/(mct+t(mct)))) # LRT [1] > pchisq(2*sum(mct*log(2*mct/(mct+t(mct)))),6,lower=f) [1] > (symfac <- factor(apply(eyegrade[,2:3],1,function(x) paste(sort(x),collapse="-")))) best-best best-second best-third best-worst best-second second-second second-third second-worst best-third second-third third-third third-worst best-worst second-worst third-worst worst-worst 10 Levels: best-best best-second best-third best-worst second-second... worst-worst > mods <- glm(y ~ symfac, family=poisson, data=eyegrade); > c(deviance(mods),df.residual(mods)); [1] > pchisq(deviance(mods), df.residual(mods),lower=f); # GOF of loglinear model [1] > sum((mct-t(mct))^2/(mct+t(mct)))/2 #Bowker s Test of symmetry [1] > pchisq(sum((mct-t(mct))^2/(mct+t(mct)))/2,6,lower=f) [1] Dabao Zhang Page 32
34 Quasi-symmetry: allows the main-effect terms in the symmetry loglinear model to differ to accommodate marginal heterogeneity, log(µ ij ) = λ + α i + β j + γ ij γ ij = γ ji π ij =? (note that µ ij µ ji ) Q: What is the LRT? 2(l full l null ) asy χ 2 (I 1)(I 2)/2, under H 0 Equivalent to the goodness-of-fit test for a loglinear model with properly defined dummy variables! Example: Grading of Eye Pairs for Distance Vision (Continued) > modq <- glm(y ~ right+left+symfac, family=poisson, data=eyegrade); > c(deviance(modq),df.residual(modq)); [1] > pchisq(deviance(modq), df.residual(modq),lower=f); # GOF of loglinear model [1] > anova(mods,modq,test="chi"); Model 1: y ~ symfac Model 2: y ~ right + left + symfac Resid. Df Resid. Dev Df Deviance P(> Chi ) Dabao Zhang Page 33
35 A square contingency table satisfies quasi-independence when the variables are independent, given that the row and column outcomes differ log(µ ij ) = λ + α i + β j + δ i 1 {i=j} The first three terms specify independence, and {δ i } permit {µ ii } to depart from this pattern and have arbitrary positive values. Quasi-indpendence is the special case of quasi-summetry in which {γ ij,i j} are identical. They are equivalent when I = 3. Q: What is the LRT (I 3)? 2(l full l null ) asy χ 2 (I 1) 2 I, under H 0 Equivalent to the goodness-of-fit test for a loglinear model with properly defined dummy variables! Example: Grading of Eye Pairs for Distance Vision (Continued) > modqi <- glm(y ~ right+left, family=poisson, subset=-c(1,6,11,16), data=eyegrade); > c(deviance(modqi),df.residual(modqi)); [1] > pchisq(deviance(modqi), df.residual(modqi),lower=f); # GOF of loglinear model [1] e-41 Dabao Zhang Page 34
36 Three-Way Contingency Tables Example: Mortality Due to Smoking in Women A survey of one in six residents of Whickham, near Newcastle, England was made in Twenty years later, this data recorded in a follow-up study. Only women who are current smokers or who have never smoked are included. > library(faraway); data(femsmoke); > cbind(femsmoke[1:14,], index=15:28,femsmoke[15:28,]) y smoker dead age index y smoker dead age 1 2 yes yes yes no no yes no no yes yes yes no no yes no no yes yes yes no no yes no no yes yes yes no no yes no no yes yes yes no no yes no no yes yes yes no no yes no no yes yes yes no no yes no no 75+ Dabao Zhang Page 35
37 Simpson s Paradox > (ct <- xtabs(y~smoker+dead,femsmoke)) dead smoker yes no yes no > fisher.test(ct)... p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio Can we conclude that smoking has a beneficial effect on longevity? > ct3 <- xtabs(y~smoker+dead+age,femsmoke) > apply(ct3,3,function(x){tr<-fisher.test(x); tr$estimate}) All odds ratio are greater than one with the exception of the age group. But how to test independence in 2 2 table across K strata? Dabao Zhang Page 36
38 I J K Table: The three categorical variables, e.g., X, Y and Z, have I, J and K categories, respectively. Multinomial Sampling: assumes a multinomial distribution cell probabilities {π ijk }, and π ijk = 1. i j k Poisson Sampling: assume each cell account n ijk following a Poisson distribution with rate µ ijk. So, n +++ Poisson(µ +++ ) with µ ijk = µ +++. i j k µ ijk = µ +++ π ijk Mutual Independence X, Y and Z are mutually independent when, for all i, j and k π ijk = π i++ π +j+ π ++k Mutual independence has loglinear form logµ ijk = λ + α i + β j + γ k Test of mutual independence Pearson s χ 2 test GOF test for the loglinear model Dabao Zhang Page 37
39 Example: Mortality Due to Smoking in Women (Continued) > summary(ct3) # Pearson s chi-sq test Call: xtabs(formula = y ~ smoker + dead + age, data = femsmoke) Number of cases in table: 1314 Number of factors: 3 Test for independence of all factors: Chisq = 790.6, df = 19, p-value = 2.140e-155 > modi <- glm(y~smoker+dead+age,family=poisson,data=femsmoke); > c(deviance(modi),df.residual(modi)) [1] > pchisq(deviance(modi),df.residual(modi),lower=f) [1] e-143 Conclusion? Dabao Zhang Page 38
40 Joint Independence Z is jointly independent of X and Y when, for all i, j and k Joint independence has loglinear form π ijk = π ij+ π ++k logµ ijk = λ + α ij + β k Mutual independence implies joint independence of any one variable from the others Test of joint independence Pearson s χ 2 test (after combining the levels of X and Y) GOF test for the loglinear model Dabao Zhang Page 39
41 Example: Mortality Due to Smoking in Women (Continued) We want to investigate whether age is jointly independent of smoking and life status > femsmoke$sdead <- factor(apply(femsmoke[,2:3],1, function(x) paste(x,collapse="-"))) > (ct2 <- xtabs(y~sdead+age,femsmoke)) age sdead no-no no-yes yes-no yes-yes > summary(ct2) Call: xtabs(formula = y ~ sdead + age, data = femsmoke) Number of cases in table: 1314 Number of factors: 2 Test for independence of all factors: Chisq = 734.7, df = 18, p-value = 2.455e-144 > modj <- glm(y~smoker*dead+age,family=poisson,data=femsmoke) > c(deviance(modj),df.residual(modj)) [1] > pchisq(deviance(modj),df.residual(modj),lower=f) [1] e-142 Conclusion? Dabao Zhang Page 40
42 Conditional Independence X and Y are conditionally independent of Z when, for all i, j and k π ij k = π i+ k π +j k π ijk = π i+k π +jk /π ++k π ij k = P(X = i,y = j Z = k) π i+ k = P(X = i Z = k), π +j k = P(Y = j Z = k) Conditional independence has loglinear form logµ ijk = λ + α ik + β jk It is a weaker condition than mutual or joint independence Test of conditional independence 2 2 K Tables: Cochran-Mantel-Haenszel test CMH = ( n 11k ) µ 2 / 11k 0.5 σ 2 11k asy. χ2 1, under H 0 k k k µ 11k = n 1+k n +1k /n ++k σ 2 11k = n 1+kn 2+k n +1k n +2k /[n 2 ++k (n ++k 1)] Test which of the three possible two-way interactions does not appear in the loglinear model Dabao Zhang Page 41
43 Example: Mortality Due to Smoking in Women (Continued) We want to investigate whether smoking and life status are independent given age. > mantelhaen.test(ct3,exact=true) Exact conditional test of independence in 2 x 2 x k tables data: ct3 S = 139, p-value = alternative hypothesis: true common odds ratio is not equal to 1 95 percent confidence interval: sample estimates: common odds ratio > modc <- glm(y~smoker*age+dead*age,family=poisson,data=femsmoke) > c(deviance(modc),df.residual(modc)) [1] > pchisq(deviance(modc),df.residual(modc),lower=f) [1] Conclusion? Caution: GOF test may not work well when some cell counts are small! Dabao Zhang Page 42
44 Homogeneous Association Homogenous association implies that the conditional relationship between any pair of variables given the third one is the same at each level of the third variable. It is also know as a no three-factor interactions model or no second-order interactions model. The loglinear model of homogeneous association, logµ ijk = λ + α ij + β jk + γ ik At a fixed level k of Z, consider the conditional local odds ratios, θ ij k = = π ij k π i+1,j+1 k π i,j+1 k π i+1,j k = π ijkπ i+1,j+1,k π i,j+1,k π i+1,j,k µ ijk µ i+1,j+1,k µ i,j+1,k µ i+1,j,k = exp{α ij + α i+1,j+1 } exp{α i+1,j + α i,j+1 }, Similar conclusion when fixing the level of X or Y. > modh <- glm(y~(smoker+age+dead)^2,family=poisson,data=femsmoke); > ctf <- xtabs(fitted(modh)~smoker+dead+age,femsmoke) > apply(ctf,3,function(x) (x[1,1]*x[2,2])/(x[1,2]*x[2,1]) ) > anova(modc,modh,test="chisq") % p-value = Conclusion? Dabao Zhang Page 43
Loglinear models. STAT 526 Professor Olga Vitek
Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number
More informationCategorical Variables and Contingency Tables: Description and Inference
Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence
ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds
More informationCorrespondence Analysis
Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationThree-Way Contingency Tables
Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep
More informationHomework 10 - Solution
STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as
page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These
More information3 Way Tables Edpsy/Psych/Soc 589
3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationChapter 11: Analysis of matched pairs
Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationCDA Chapter 3 part II
CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationn y π y (1 π) n y +ylogπ +(n y)log(1 π).
Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives
More informationLecture 25: Models for Matched Pairs
Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture
More informationAnalysis of Categorical Data Three-Way Contingency Table
Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationChapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationResearch Methodology: Tools
MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationPoisson Regression. Gelman & Hill Chapter 6. February 6, 2017
Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationChapter 11: Models for Matched Pairs
: Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationStatistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann
Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:
More informationAnalysis of data in square contingency tables
Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationSolution to Tutorial 7
1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationOne-Way Tables and Goodness of Fit
Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the
More informationElementary Statistics Lecture 3 Association: Contingency, Correlation and Regression
Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT 201
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationMARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES
REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of
More informationStatistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationMultiple Sample Categorical Data
Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationStatistics for Managers Using Microsoft Excel
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More information1 Interaction models: Assignment 3
1 Interaction models: Assignment 3 Please answer the following questions in print and deliver it in room 2B13 or send it by e-mail to rooijm@fsw.leidenuniv.nl, no later than Tuesday, May 29 before 14:00.
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationLecture 23. November 15, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationPOLI 443 Applied Political Research
POLI 443 Applied Political Research Session 6: Tests of Hypotheses Contingency Analysis Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College
More information