Categorical Variables and Contingency Tables: Description and Inference
|
|
- Barbara O’Neal’
- 5 years ago
- Views:
Transcription
1 Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3
2 Univariate Binomial and Multinomial Measurements 3-1
3 Binomial Distribution Probability distribution: Y 1, Y 2,..., Y n iid Bernouilli(π) n i=1 p(y) = Y i Binomial(n, π) ( n y ) π y (1 π) n y µ = E(Y ) = nπ, σ 2 = var(y ) = nπ(1 π) Log-ikelihood: L(π) = ylog(π) + (n y)log(1 π) Maximum Likelihood Estimator: ˆπ = y/n E(ˆπ) = π, SE(ˆπ) = π(1 π) n 3-2
4 Large-sample tests for π For a known π 0, test H 0 : π = π 0 vs H 0 : π π 0 Wald test: z W = ˆπ π 0 SE = ˆπ π 0 ˆπ(1 ˆπ)/n H 0,approx N (0, 1) Likelihood ratio Test: z L = 2(L 1 L 0 ) = 2 H 0,approx χ 2 1 ( ylog ˆπ + (n y)log 1 ˆπ ) π 0 1 π 0 Score Test: z S = ˆπ π 0 SE 0 = ˆπ π 0 π0 (1 π 0 )/n Closer to N (0, 1) than Wald H 0,approx N (0, 1) 3-3
5 Large-sample CI for π Based on the Wald test statistic: ˆπ ± z α/2 ˆπ(1 ˆπ) n Performs poorly unless large n Based on the Score Test statistic: ( ) n ˆπ n + zα/ ( z 2 α/2 n + z 2 α/2 ) ± z α/2 1 n + zα/2 2 [ ˆπ(1 ˆπ) ( n n + z 2 α/2 ) ( )] z 2 α/2 2 n + zα/2 2 Performs better than Wald 3-4
6 Multinomial Distribution Probability distribution: (Y i1,..., Y ic ) {Y ij = 1 if in category j, and 0 otherwise } n i=1 Y ij Multinomial(π 1,..., π c ), n = p(n 1, n 2,..., n c 1 ) = ( n! n 1!n 2!...n c! ) c n j j=1 π n 1 1 πn πn c c E(n j ) = nπ j var(n j ) = nπ j (1 π j ), cov(n j, n k ) = nπ j π k Log-likelihood: L(π) = c j=1 n j logπ j Maximum Likelihood Estimator: ˆπ j = n j /n 3-5
7 Large-Sample Test for (π 1,..., π c ) For known (π 10, π 20,... π c0 ), test H 0 : π j = π j0 vs H 0 : π j π j0 Pearson test: X 2 = c j=1 (O j E j0 ) 2 E j0 = c (n j nπ j0 ) 2 j=1 nπ j0 H 0, approx E.g. in genetics: test theories of trait inheritance χ 2 c 1 Likelihood Ratio test: G 2 = 2(L 1 L 0 ) = 2 n j=1 log( n j nπ j 0 ) H 0, approx χ 2 c 1 Asymptotically equivalent when H 0 is true. For n/c < 5, X 2 converges faster 3-6
8 Poisson Distribution Probability distribution: Y - number of events in a fixed interval of space/time Y P oisson(µ) p(y) = e µ µ y y!, y = 0, 1,...; E(Y ) = var(y ) = µ Y 1, Y 2,..., Y c ind P oisson(µ i ), c i=1 Y i P oisson( c i=1 µ i ) c indep. Poisson r.v. total Multinomial P (Y 1 = n 1,..., Y c = n c = P (Y 1 = n 1,..., Y c = n c ) P ( i Y i = n) Y i = n) i = [ exp( µ i )µ n i i /n i! ] i exp( µ i ) ( µ i ) n /n! = n! n i! i i i i π n i i, π i = µ i µ i i 3-7
9 2-Way Contingency Tables 3-8
10 Contingency Tables Contingency Table = Classification Table: frequency of outcomes Two-Way Table: frequency outcomes of two categorical variables I J table: columns. a table with I rows and J Contingency tables can arise from several sampling schemes Inference depends on the sampling scheme Example: Lung Cancer Smoking Cases Controls Total Yes No Total
11 Joint Distribution and Independence Underlying probability distribution of X (smoking) and Y (cancer) Joint distribution: π ij, probability of cell (i, j) Marginal distribution: π i+ = J π +j = j=1 I i=1 π ij, probability of row i π ij, probability of column j Conditional distribution: π j i = π ij /π i+, distribution of j given i Independence: π ij = π i+ π +j for all i and j 3-10
12 Multinomial Sampling The total sample size n is fixed, but the row and column totals are not X and Y are treated equally P (X = i, Y = j) = π ij, i = 1,..., I; j = 1,..., J describe associations with joint distributions. back to the case of the Multinomial distribution Likelihood and log-likelihood: Likelihood = L = I J i=1 j=1 n! n 11! n IJ! I i J j=1 π n ij ij n ij log(π ij ) + constant 3-11
13 Multinomial Sampling: Testing for Independence Hypotheses: H 0 : reduced model π ij = π i+ π +j, for all i and j H a : full model π ij π i+ π +j, for some i and j Pearson χ2 test: X 2 = I J i=1 j=1 (O ij E ij ) 2 E ij H 0, approx. χ 2 (I 1)(J 1) O ij = n ij, E ij = nˆπ i+ˆπ +j = n i+ n +j /n Df = (I 1)(J 1) = (IJ 1) (I 1) (J 1) Likelihood Ratio test: Full model: ˆπ ij = n ij /n ++ Reduced model: ˆπ i+ = n i+ /n ++ ; ˆπ +j = n +j /n ++. G 2 = 2(L 1 L 0 ) = I J 2 n ij log n ijn ++ H 0, approx. n i+ n +j i=1 j=1 χ 2 (I 1)(J 1) 3-12
14 Independent (or Product) Multinomial Sampling The row totals n i+, i = 1,..., I, are fixed E.g., X is an explanatory variable, and response Y occurs separately at each setting of X. View categorical response as function of categorical predictor Describe associations in terms of conditional distributions P (Y = j X = i) = π j i, i = 1,, I; j = 1,, J For a fixed i, {n ij, j = 1,, J} follow a multinomial distribution f(n i1..., n ij } = n i+! n i1! n ij! J j=1 π n ij j i 3-13
15 Compare Proportions Independent Multinomial Sampling H 0 : π 1 = π 2 vs H a : π 1 π 2 ML estimate of the difference: ˆπ 1 ˆπ 2 = y 1 n 1 y 2 n 2 SE(ˆπ 1 ˆπ 2 ) = [ π1 (1 π 1 ) n 1 + π 2(1 π 2 ) n 2 ] 1/2 Wald Confidence Interval: ˆπ 1 ˆπ 2 ± z α/2 ŜE(ˆπ 1 ˆπ 2 ) Replace π with ˆπ to estimate SE Usually too narrow Better methods (e.g. delta method) exist 3-14
16 Testing for Independence of Rows and Columns Independent Multinomial Sampling Independence in this context is often called homogeneity of the conditional distributions X and Y are independent π j 1 = = π j I, for all j Can interpret the independence in terms of product of marginal probabilities π ij = π i+ π +j for all i and j π j 1 = = π j I for all j π j i = π ij /π i+ = (π i+ π +j )/π i+ = π +j I I Let π j i = a j, then π +j = π ij = π i+ a j = a j = π ij = π i+ π +j i=1 i=1 3-15
17 Testing for Independence of Rows and Columns Test the homogeneity of conditional distributions Column Row 1 J Total π 11 π 1J 1 π (π 1 1 ) (π J 1 ) I π I1 (π 1 I ) π IJ (π J I ) π I+ Total π +1 π +J π ++ Consider the new notation: π j (x) = P (Y = j X = x) Although the interpretation is different, use the same Pearson X 2 test and the LR test 3-16
18 Test for Independence: Odds Ratio Odds Ratio: θ = π 11/π 12 π 21 /π 22 = π 11π 22 π 12 π 21 = = P (Y = 1 X = 1)/P (Y = 2 X = 1) P (Y = 1 X = 2)/P (Y = 2 X = 2) P (X = 1 Y = 1)/P (X = 2 Y = 1) P (X = 1 Y = 2)/P (X = 2 Y = 2) Equally valid for prospective (conditional on X), retrospective (conditional on Y ) and cross-sectional (multinomial) sampling designs MLE: ˆθ = n 11/n 12 n 12 /n 22 = n 11n 22 n 12 n 21 When some n ij = 0, ˆθ is not a good estimator. Is improved by adding 0.5 to each cell count: θ = (n )(n ) (n )(n ) 3-17
19 Test for Independence: Odds Ratio X and Y are independent θ = π 11/π 12 π 21 /π 22 = π 11π 22 π 12 π 21 = 1 to check, substitute π ij = π i+ π +j in the formula above Asymptotically, log ˆθ N(log(θ), ˆσ 2 ), where ˆσ 2 = 1 n n n n 22 Large-sample CI for logθ : logˆθ ± z α/2 ŜE(logˆθ) = [L, U] Large-sample CI for θ : [e L, e U ]. Usually too wide 3-18
20 Poisson Sampling Observe a process over a period of time, and observe the number of occurrencies No fixed quantities Poisson sampling assumes each Y ij ind P oisson(π ij ) Denote Y ij the count of cell (i, j) I J Y ij P oission I J i=1 j=1 i=1 j=1 π ij Hypothesis of independence of X and Y has the form log(π ij ) = λ + α i + β j This is the log-linear model of independence for two-way contingency tables Under independence, log(µ ij ) is an additive function of a row effect α i and a column effect β j. Since we don t have a replicate table, the model with the interaction is saturated 3-19
21 Poisson Sampling An additive model log π ij = µ + α i + β j implies the independence of the margins π ij = = E(count) sum of all E(count) e µ+α i+β j e µ ( i e α i)( j e β j) = π i+π +j, where π i+ = e α i/ i e α i = j π ij, π +j = e β j/ j e β j = i π ij. Test for independence: Pearson X 2 or LR test as before (more on this later) 3-20
22 Hypergeometric Sampling Both row and column margins are fixed. When X and Y are independent, given the row and column margins, follows hypergeometric distribution ( I ) ( i=1 n J ) i+! j=1 n +j! n ++! I i=1 J j=1 n ij! the distribution is parameter free For a 2 2 table ( n1+ ) ( n2+ ) P (n 11 = k) = k n +1 k ( ), n++ n +1 max(0, n 1+ + n +1 n) k min(n 1+, n +1 ) Fisher s exact test: p-value = total probability of all outcomes more extreme than the one observed. Takes discrete values for small samples 3-21
23 Case study: Agresti p.80 # read the data X <- data.frame(y=c(178, 138, 108, 570, 648, 442, 138, 252, 252), belief=rep(c("1-fundam", "2-Moder", "3-Liber"), 3), degree=rep(c("1-<hs", "2-HS", "3-BS/grad"), 1, each=3) ) # a table of observed values (ov) ov <- xtabs(y ~ degree+belief, data=x) > ov belief degree 1-Fundam 2-Moder 3-Liber 1-<HS HS BS/grad # export the table into latex # export the table into latex library(xtable) xtable(ov) \begin{table}[ht] \begin{center} \begin{tabular}{rrrr} \hline & 1-Fundam & 2-Moder & 3-Liber \\ \hline 1-$<$HS & & & \\ 2-HS & & & \\
24 Data visualization # dotchart dotchart(t(ov), xlab="observed counts") 1 <HS 3 Liber 2 Moder 1 Fundam 2 HS 3 Liber 2 Moder 1 Fundam 3 BS/grad 3 Liber 2 Moder 1 Fundam Observed counts 3-23
25 Data visualization # mosaic plot mosaicplot(ov, color=true) ov 1 <HS 2 HS 3 BS/grad 3 Liber belief 2 Moder 1 Fundam degree 3-24
26 2 x 2 table: Compare Proportions Independent multinomial sampling: restrictions on the rows compare proportions of columns, given rows also implements the Pearson X 2 test with Yates correction for small samples (from each O-E, subtract 0.5 if positive, and add 0.5 if negative) > prop.test(ov[1:2,1:2]) 2-sample test for equality of proportions with continuity correction data: ov[1:2, 1:2] X-squared = , df = 1, p-value = alternative hypothesis: two.sided 95 percent confidence interval: sample estimates: prop 1 prop #-----double-check the proportions > 178/( ) [1] > 570/( ) [1]
27 2 x 2 table: Hypergeometric Sampling conditional on both margins Hypergeometric test compare distributions of counts within the 4 cells H 0 is specified in terms of OR=1 produces CI for the OR > fisher.test(ov[1:2,1:2]) Fisher s Exact Test for Count Data data: ov[1:2, 1:2] p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio
28 I x J table: Pearson X 2 (Independent) multinomial sampling restrictions on a margin, or on the total H 0 in terms of independence of rows and columns > summary(ov) Call: xtabs(formula = y ~ degree + belief, data = X) Number of cases in table: 2726 Number of factors: 2 Test for independence of all factors: Chisq = 69.16, df = 4, p-value = 3.42e-14 Pearson residuals e ij = n ij ˆµ ij ˆµ 1/2 ij divide residual by ŜE(n ij ) in Poisson sampling Standardized Pearson residuals e ij = n ij ˆµ ij ˆµij (1 p i+ )(1 p +j ) divide residual by ŜE(residual) in Poisson sampling 3-27
29 Visualizing the association # --Compute Pearson and standardized Pearson residuals --- e <- apply(ov, 1, sum) %*% t(apply(ov, 2, sum)) / sum(ov) pearsonresid <- (ov - e)/sqrt(e) prow <- 1-apply(ov, 1, sum) / sum(ov) pcol <- 1-apply(ov, 2, sum) / sum(ov) standpearsonresid <- pearsonresid/ sqrt(prow %*% t(pcol) ) dotchart( t(standpearsonresid) ) abline(v=c(-2,2)) 1 <HS 3 Liber 2 Moder 1 Fundam 2 HS 3 Liber 2 Moder 1 Fundam 3 BS/grad 3 Liber 2 Moder 1 Fundam Standardized Pearson Residuals 3-28
30 Ordered Categories Ordered categories have more info Assign scores to categories Rows: (u 1,... u I ), e.g. (1,..., I) Cols: (v 1,... v j ), e.g. (1,..., J) H 0 : cor(u, v) = 0 vs H a : cor(u, v) 0 produces CI for the OR Study the linear trend r = [ I i=1 j=1 I J (u i ū)(v j v)n ij i=1 j=1 ] [ J (u i ū) 2 n ij ] I J (v i v) 2 n ij i=1 j=1 ū = I J u i n ij /n; v = i=1 j=1 J I i=1 j=1 v i n ij /n; M 2 = (n 1)r 2 H 0 χ
31 Case Study: Ordered Categories # existing implementation > library(coin) > lbl_test(as.table(ov)) Asymptotic Linear-by-Linear Association Test data: belief (ordered) by degree (1-<HS < 2-HS < 3-BS/grad) chi-squared = , df = 1, p-value = 6.939e-14 # manually u <- as.vector(scale(1:3, center=sum(c(1:3)*ov)/sum(ov), scale=false)) v <- as.vector(scale(1:3, center=sum(t(ov)*c(1:3))/sum(ov), scale=false)) r <- sum(u%*%t(v)*ov) / sqrt(sum(u^2*ov) * sum(t(ov) * v^2)) M2 <- (sum(ov) - 1) * r^2 > 1-pchisq(M2, 1, lower=true) [1] e
32 2x2 pairs: Matched Pairs Repeated measurements on same subjects ask the same people the same question twice goal: compare proportions absence of association cannot be interpreted as independence Example (Agresti Ch. 10.1) Approval of the President s performance, one month apart, for a same sample of Americans. Approve Disapprove Approve Disapprove H 0 : Marginal homogeneity. π 1+ = π +1 δ = π 1+ π +1 = (π 11 +π 12 (π 11 +π 21 ) = π 12 π 21 Equivalent to testing table symmetry 3-31
33 Large-sample test and CI CI ˆδ = p +1 p 1+ = p 2+ p +2 var(ˆδ) = [π 1+ (1 π +1 ) + π +1 (1 π +1 ) 2(π 11 π 22 π 12 π 21 )] /n smaller variance than in independent samples, therefore a more efficient design var(ˆδ) = [ (p 12 + p 21 ) (p 12 p 21 ) 2] /n CI: ˆδ ± z α/2 ŜE(ˆδ) Wald Test z = ˆδ ŜE(δ) = n 21 n 12 (n 21 +n 12 ) 1/2 z 2 H 0 χ 2 1 (called McNemar test) Only depends on counts outside of the diagonal 3-32
34 President Approval Example # Read the data Performance <- matrix(c(794, 86, 150, 570), nrow = 2, dimnames = list("1st Survey" = c("approve", "Disapprove"), "2nd Survey" = c("approve", "Disapprove")) ) > Performance 2nd Survey 1st Survey Approve Disapprove Approve Disapprove # Test > mcnemar.test(performance) McNemar s Chi-squared test with continuity correction data: Performance McNemar s chi-squared = , df = 1, p-value = 4.115e-05 significant change (in fact, drop) in approval ratings 3-33
Categorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationn y π y (1 π) n y +ylogπ +(n y)log(1 π).
Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 7 Contingency Table 0-0 Outline Introduction to Contingency Tables Testing Independence in Two-Way Contingency Tables Modeling Ordinal Associations
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationThe Multinomial Model
The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence
ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds
More informationLoglinear models. STAT 526 Professor Olga Vitek
Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationChapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationCorrespondence Analysis
Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More information3 Way Tables Edpsy/Psych/Soc 589
3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............
More information2.6.3 Generalized likelihood ratio tests
26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More informationInference for Binomial Parameters
Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as
page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationChapter 11: Analysis of matched pairs
Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationBinary Response: Logistic Regression. STAT 526 Professor Olga Vitek
Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationSession 3 The proportional odds model and the Mann-Whitney test
Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationOne-Way Tables and Goodness of Fit
Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the
More information2.3 Analysis of Categorical Data
90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationModeling and inference for an ordinal effect size measure
STATISTICS IN MEDICINE Statist Med 2007; 00:1 15 Modeling and inference for an ordinal effect size measure Euijung Ryu, and Alan Agresti Department of Statistics, University of Florida, Gainesville, FL
More informationˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.
Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the
More informationML Testing (Likelihood Ratio Testing) for non-gaussian models
ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationGoodness of Fit Tests
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationAnalysis of data in square contingency tables
Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationLecture 25: Models for Matched Pairs
Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture
More informationOrdinal Variables in 2 way Tables
Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationInferences for Proportions and Count Data
Inferences for Proportions and Count Data Corresponds to Chapter 9 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT), with some slides by Ramón V. León (University of Tennessee) 1 Inference
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationStat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016
Stat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016 1 One-Way Contingency Table The data set read in by the R function read.table below simulates 6000
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationWORKSHOP 3 Measuring Association
WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationStatistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann
Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Aalysis Mahida Samarakoo Jauary 28, 2016 Mahida Samarakoo STAC51: Categorical data Aalysis 1 / 35 Table of cotets Iferece for Proportios 1 Iferece for Proportios Mahida Samarakoo
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More information