Categorical Variables and Contingency Tables: Description and Inference

Size: px
Start display at page:

Download "Categorical Variables and Contingency Tables: Description and Inference"

Transcription

1 Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3

2 Univariate Binomial and Multinomial Measurements 3-1

3 Binomial Distribution Probability distribution: Y 1, Y 2,..., Y n iid Bernouilli(π) n i=1 p(y) = Y i Binomial(n, π) ( n y ) π y (1 π) n y µ = E(Y ) = nπ, σ 2 = var(y ) = nπ(1 π) Log-ikelihood: L(π) = ylog(π) + (n y)log(1 π) Maximum Likelihood Estimator: ˆπ = y/n E(ˆπ) = π, SE(ˆπ) = π(1 π) n 3-2

4 Large-sample tests for π For a known π 0, test H 0 : π = π 0 vs H 0 : π π 0 Wald test: z W = ˆπ π 0 SE = ˆπ π 0 ˆπ(1 ˆπ)/n H 0,approx N (0, 1) Likelihood ratio Test: z L = 2(L 1 L 0 ) = 2 H 0,approx χ 2 1 ( ylog ˆπ + (n y)log 1 ˆπ ) π 0 1 π 0 Score Test: z S = ˆπ π 0 SE 0 = ˆπ π 0 π0 (1 π 0 )/n Closer to N (0, 1) than Wald H 0,approx N (0, 1) 3-3

5 Large-sample CI for π Based on the Wald test statistic: ˆπ ± z α/2 ˆπ(1 ˆπ) n Performs poorly unless large n Based on the Score Test statistic: ( ) n ˆπ n + zα/ ( z 2 α/2 n + z 2 α/2 ) ± z α/2 1 n + zα/2 2 [ ˆπ(1 ˆπ) ( n n + z 2 α/2 ) ( )] z 2 α/2 2 n + zα/2 2 Performs better than Wald 3-4

6 Multinomial Distribution Probability distribution: (Y i1,..., Y ic ) {Y ij = 1 if in category j, and 0 otherwise } n i=1 Y ij Multinomial(π 1,..., π c ), n = p(n 1, n 2,..., n c 1 ) = ( n! n 1!n 2!...n c! ) c n j j=1 π n 1 1 πn πn c c E(n j ) = nπ j var(n j ) = nπ j (1 π j ), cov(n j, n k ) = nπ j π k Log-likelihood: L(π) = c j=1 n j logπ j Maximum Likelihood Estimator: ˆπ j = n j /n 3-5

7 Large-Sample Test for (π 1,..., π c ) For known (π 10, π 20,... π c0 ), test H 0 : π j = π j0 vs H 0 : π j π j0 Pearson test: X 2 = c j=1 (O j E j0 ) 2 E j0 = c (n j nπ j0 ) 2 j=1 nπ j0 H 0, approx E.g. in genetics: test theories of trait inheritance χ 2 c 1 Likelihood Ratio test: G 2 = 2(L 1 L 0 ) = 2 n j=1 log( n j nπ j 0 ) H 0, approx χ 2 c 1 Asymptotically equivalent when H 0 is true. For n/c < 5, X 2 converges faster 3-6

8 Poisson Distribution Probability distribution: Y - number of events in a fixed interval of space/time Y P oisson(µ) p(y) = e µ µ y y!, y = 0, 1,...; E(Y ) = var(y ) = µ Y 1, Y 2,..., Y c ind P oisson(µ i ), c i=1 Y i P oisson( c i=1 µ i ) c indep. Poisson r.v. total Multinomial P (Y 1 = n 1,..., Y c = n c = P (Y 1 = n 1,..., Y c = n c ) P ( i Y i = n) Y i = n) i = [ exp( µ i )µ n i i /n i! ] i exp( µ i ) ( µ i ) n /n! = n! n i! i i i i π n i i, π i = µ i µ i i 3-7

9 2-Way Contingency Tables 3-8

10 Contingency Tables Contingency Table = Classification Table: frequency of outcomes Two-Way Table: frequency outcomes of two categorical variables I J table: columns. a table with I rows and J Contingency tables can arise from several sampling schemes Inference depends on the sampling scheme Example: Lung Cancer Smoking Cases Controls Total Yes No Total

11 Joint Distribution and Independence Underlying probability distribution of X (smoking) and Y (cancer) Joint distribution: π ij, probability of cell (i, j) Marginal distribution: π i+ = J π +j = j=1 I i=1 π ij, probability of row i π ij, probability of column j Conditional distribution: π j i = π ij /π i+, distribution of j given i Independence: π ij = π i+ π +j for all i and j 3-10

12 Multinomial Sampling The total sample size n is fixed, but the row and column totals are not X and Y are treated equally P (X = i, Y = j) = π ij, i = 1,..., I; j = 1,..., J describe associations with joint distributions. back to the case of the Multinomial distribution Likelihood and log-likelihood: Likelihood = L = I J i=1 j=1 n! n 11! n IJ! I i J j=1 π n ij ij n ij log(π ij ) + constant 3-11

13 Multinomial Sampling: Testing for Independence Hypotheses: H 0 : reduced model π ij = π i+ π +j, for all i and j H a : full model π ij π i+ π +j, for some i and j Pearson χ2 test: X 2 = I J i=1 j=1 (O ij E ij ) 2 E ij H 0, approx. χ 2 (I 1)(J 1) O ij = n ij, E ij = nˆπ i+ˆπ +j = n i+ n +j /n Df = (I 1)(J 1) = (IJ 1) (I 1) (J 1) Likelihood Ratio test: Full model: ˆπ ij = n ij /n ++ Reduced model: ˆπ i+ = n i+ /n ++ ; ˆπ +j = n +j /n ++. G 2 = 2(L 1 L 0 ) = I J 2 n ij log n ijn ++ H 0, approx. n i+ n +j i=1 j=1 χ 2 (I 1)(J 1) 3-12

14 Independent (or Product) Multinomial Sampling The row totals n i+, i = 1,..., I, are fixed E.g., X is an explanatory variable, and response Y occurs separately at each setting of X. View categorical response as function of categorical predictor Describe associations in terms of conditional distributions P (Y = j X = i) = π j i, i = 1,, I; j = 1,, J For a fixed i, {n ij, j = 1,, J} follow a multinomial distribution f(n i1..., n ij } = n i+! n i1! n ij! J j=1 π n ij j i 3-13

15 Compare Proportions Independent Multinomial Sampling H 0 : π 1 = π 2 vs H a : π 1 π 2 ML estimate of the difference: ˆπ 1 ˆπ 2 = y 1 n 1 y 2 n 2 SE(ˆπ 1 ˆπ 2 ) = [ π1 (1 π 1 ) n 1 + π 2(1 π 2 ) n 2 ] 1/2 Wald Confidence Interval: ˆπ 1 ˆπ 2 ± z α/2 ŜE(ˆπ 1 ˆπ 2 ) Replace π with ˆπ to estimate SE Usually too narrow Better methods (e.g. delta method) exist 3-14

16 Testing for Independence of Rows and Columns Independent Multinomial Sampling Independence in this context is often called homogeneity of the conditional distributions X and Y are independent π j 1 = = π j I, for all j Can interpret the independence in terms of product of marginal probabilities π ij = π i+ π +j for all i and j π j 1 = = π j I for all j π j i = π ij /π i+ = (π i+ π +j )/π i+ = π +j I I Let π j i = a j, then π +j = π ij = π i+ a j = a j = π ij = π i+ π +j i=1 i=1 3-15

17 Testing for Independence of Rows and Columns Test the homogeneity of conditional distributions Column Row 1 J Total π 11 π 1J 1 π (π 1 1 ) (π J 1 ) I π I1 (π 1 I ) π IJ (π J I ) π I+ Total π +1 π +J π ++ Consider the new notation: π j (x) = P (Y = j X = x) Although the interpretation is different, use the same Pearson X 2 test and the LR test 3-16

18 Test for Independence: Odds Ratio Odds Ratio: θ = π 11/π 12 π 21 /π 22 = π 11π 22 π 12 π 21 = = P (Y = 1 X = 1)/P (Y = 2 X = 1) P (Y = 1 X = 2)/P (Y = 2 X = 2) P (X = 1 Y = 1)/P (X = 2 Y = 1) P (X = 1 Y = 2)/P (X = 2 Y = 2) Equally valid for prospective (conditional on X), retrospective (conditional on Y ) and cross-sectional (multinomial) sampling designs MLE: ˆθ = n 11/n 12 n 12 /n 22 = n 11n 22 n 12 n 21 When some n ij = 0, ˆθ is not a good estimator. Is improved by adding 0.5 to each cell count: θ = (n )(n ) (n )(n ) 3-17

19 Test for Independence: Odds Ratio X and Y are independent θ = π 11/π 12 π 21 /π 22 = π 11π 22 π 12 π 21 = 1 to check, substitute π ij = π i+ π +j in the formula above Asymptotically, log ˆθ N(log(θ), ˆσ 2 ), where ˆσ 2 = 1 n n n n 22 Large-sample CI for logθ : logˆθ ± z α/2 ŜE(logˆθ) = [L, U] Large-sample CI for θ : [e L, e U ]. Usually too wide 3-18

20 Poisson Sampling Observe a process over a period of time, and observe the number of occurrencies No fixed quantities Poisson sampling assumes each Y ij ind P oisson(π ij ) Denote Y ij the count of cell (i, j) I J Y ij P oission I J i=1 j=1 i=1 j=1 π ij Hypothesis of independence of X and Y has the form log(π ij ) = λ + α i + β j This is the log-linear model of independence for two-way contingency tables Under independence, log(µ ij ) is an additive function of a row effect α i and a column effect β j. Since we don t have a replicate table, the model with the interaction is saturated 3-19

21 Poisson Sampling An additive model log π ij = µ + α i + β j implies the independence of the margins π ij = = E(count) sum of all E(count) e µ+α i+β j e µ ( i e α i)( j e β j) = π i+π +j, where π i+ = e α i/ i e α i = j π ij, π +j = e β j/ j e β j = i π ij. Test for independence: Pearson X 2 or LR test as before (more on this later) 3-20

22 Hypergeometric Sampling Both row and column margins are fixed. When X and Y are independent, given the row and column margins, follows hypergeometric distribution ( I ) ( i=1 n J ) i+! j=1 n +j! n ++! I i=1 J j=1 n ij! the distribution is parameter free For a 2 2 table ( n1+ ) ( n2+ ) P (n 11 = k) = k n +1 k ( ), n++ n +1 max(0, n 1+ + n +1 n) k min(n 1+, n +1 ) Fisher s exact test: p-value = total probability of all outcomes more extreme than the one observed. Takes discrete values for small samples 3-21

23 Case study: Agresti p.80 # read the data X <- data.frame(y=c(178, 138, 108, 570, 648, 442, 138, 252, 252), belief=rep(c("1-fundam", "2-Moder", "3-Liber"), 3), degree=rep(c("1-<hs", "2-HS", "3-BS/grad"), 1, each=3) ) # a table of observed values (ov) ov <- xtabs(y ~ degree+belief, data=x) > ov belief degree 1-Fundam 2-Moder 3-Liber 1-<HS HS BS/grad # export the table into latex # export the table into latex library(xtable) xtable(ov) \begin{table}[ht] \begin{center} \begin{tabular}{rrrr} \hline & 1-Fundam & 2-Moder & 3-Liber \\ \hline 1-$<$HS & & & \\ 2-HS & & & \\

24 Data visualization # dotchart dotchart(t(ov), xlab="observed counts") 1 <HS 3 Liber 2 Moder 1 Fundam 2 HS 3 Liber 2 Moder 1 Fundam 3 BS/grad 3 Liber 2 Moder 1 Fundam Observed counts 3-23

25 Data visualization # mosaic plot mosaicplot(ov, color=true) ov 1 <HS 2 HS 3 BS/grad 3 Liber belief 2 Moder 1 Fundam degree 3-24

26 2 x 2 table: Compare Proportions Independent multinomial sampling: restrictions on the rows compare proportions of columns, given rows also implements the Pearson X 2 test with Yates correction for small samples (from each O-E, subtract 0.5 if positive, and add 0.5 if negative) > prop.test(ov[1:2,1:2]) 2-sample test for equality of proportions with continuity correction data: ov[1:2, 1:2] X-squared = , df = 1, p-value = alternative hypothesis: two.sided 95 percent confidence interval: sample estimates: prop 1 prop #-----double-check the proportions > 178/( ) [1] > 570/( ) [1]

27 2 x 2 table: Hypergeometric Sampling conditional on both margins Hypergeometric test compare distributions of counts within the 4 cells H 0 is specified in terms of OR=1 produces CI for the OR > fisher.test(ov[1:2,1:2]) Fisher s Exact Test for Count Data data: ov[1:2, 1:2] p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio

28 I x J table: Pearson X 2 (Independent) multinomial sampling restrictions on a margin, or on the total H 0 in terms of independence of rows and columns > summary(ov) Call: xtabs(formula = y ~ degree + belief, data = X) Number of cases in table: 2726 Number of factors: 2 Test for independence of all factors: Chisq = 69.16, df = 4, p-value = 3.42e-14 Pearson residuals e ij = n ij ˆµ ij ˆµ 1/2 ij divide residual by ŜE(n ij ) in Poisson sampling Standardized Pearson residuals e ij = n ij ˆµ ij ˆµij (1 p i+ )(1 p +j ) divide residual by ŜE(residual) in Poisson sampling 3-27

29 Visualizing the association # --Compute Pearson and standardized Pearson residuals --- e <- apply(ov, 1, sum) %*% t(apply(ov, 2, sum)) / sum(ov) pearsonresid <- (ov - e)/sqrt(e) prow <- 1-apply(ov, 1, sum) / sum(ov) pcol <- 1-apply(ov, 2, sum) / sum(ov) standpearsonresid <- pearsonresid/ sqrt(prow %*% t(pcol) ) dotchart( t(standpearsonresid) ) abline(v=c(-2,2)) 1 <HS 3 Liber 2 Moder 1 Fundam 2 HS 3 Liber 2 Moder 1 Fundam 3 BS/grad 3 Liber 2 Moder 1 Fundam Standardized Pearson Residuals 3-28

30 Ordered Categories Ordered categories have more info Assign scores to categories Rows: (u 1,... u I ), e.g. (1,..., I) Cols: (v 1,... v j ), e.g. (1,..., J) H 0 : cor(u, v) = 0 vs H a : cor(u, v) 0 produces CI for the OR Study the linear trend r = [ I i=1 j=1 I J (u i ū)(v j v)n ij i=1 j=1 ] [ J (u i ū) 2 n ij ] I J (v i v) 2 n ij i=1 j=1 ū = I J u i n ij /n; v = i=1 j=1 J I i=1 j=1 v i n ij /n; M 2 = (n 1)r 2 H 0 χ

31 Case Study: Ordered Categories # existing implementation > library(coin) > lbl_test(as.table(ov)) Asymptotic Linear-by-Linear Association Test data: belief (ordered) by degree (1-<HS < 2-HS < 3-BS/grad) chi-squared = , df = 1, p-value = 6.939e-14 # manually u <- as.vector(scale(1:3, center=sum(c(1:3)*ov)/sum(ov), scale=false)) v <- as.vector(scale(1:3, center=sum(t(ov)*c(1:3))/sum(ov), scale=false)) r <- sum(u%*%t(v)*ov) / sqrt(sum(u^2*ov) * sum(t(ov) * v^2)) M2 <- (sum(ov) - 1) * r^2 > 1-pchisq(M2, 1, lower=true) [1] e

32 2x2 pairs: Matched Pairs Repeated measurements on same subjects ask the same people the same question twice goal: compare proportions absence of association cannot be interpreted as independence Example (Agresti Ch. 10.1) Approval of the President s performance, one month apart, for a same sample of Americans. Approve Disapprove Approve Disapprove H 0 : Marginal homogeneity. π 1+ = π +1 δ = π 1+ π +1 = (π 11 +π 12 (π 11 +π 21 ) = π 12 π 21 Equivalent to testing table symmetry 3-31

33 Large-sample test and CI CI ˆδ = p +1 p 1+ = p 2+ p +2 var(ˆδ) = [π 1+ (1 π +1 ) + π +1 (1 π +1 ) 2(π 11 π 22 π 12 π 21 )] /n smaller variance than in independent samples, therefore a more efficient design var(ˆδ) = [ (p 12 + p 21 ) (p 12 p 21 ) 2] /n CI: ˆδ ± z α/2 ŜE(ˆδ) Wald Test z = ˆδ ŜE(δ) = n 21 n 12 (n 21 +n 12 ) 1/2 z 2 H 0 χ 2 1 (called McNemar test) Only depends on counts outside of the diagonal 3-32

34 President Approval Example # Read the data Performance <- matrix(c(794, 86, 150, 570), nrow = 2, dimnames = list("1st Survey" = c("approve", "Disapprove"), "2nd Survey" = c("approve", "Disapprove")) ) > Performance 2nd Survey 1st Survey Approve Disapprove Approve Disapprove # Test > mcnemar.test(performance) McNemar s Chi-squared test with continuity correction data: Performance McNemar s chi-squared = , df = 1, p-value = 4.115e-05 significant change (in fact, drop) in approval ratings 3-33

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 7 Contingency Table 0-0 Outline Introduction to Contingency Tables Testing Independence in Two-Way Contingency Tables Modeling Ordinal Associations

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

The Multinomial Model

The Multinomial Model The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information

Loglinear models. STAT 526 Professor Olga Vitek

Loglinear models. STAT 526 Professor Olga Vitek Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

Correspondence Analysis

Correspondence Analysis Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

3 Way Tables Edpsy/Psych/Soc 589

3 Way Tables Edpsy/Psych/Soc 589 3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............

More information

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

Chapter 11: Analysis of matched pairs

Chapter 11: Analysis of matched pairs Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Session 3 The proportional odds model and the Mann-Whitney test

Session 3 The proportional odds model and the Mann-Whitney test Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

One-Way Tables and Goodness of Fit

One-Way Tables and Goodness of Fit Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the

More information

2.3 Analysis of Categorical Data

2.3 Analysis of Categorical Data 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Modeling and inference for an ordinal effect size measure

Modeling and inference for an ordinal effect size measure STATISTICS IN MEDICINE Statist Med 2007; 00:1 15 Modeling and inference for an ordinal effect size measure Euijung Ryu, and Alan Agresti Department of Statistics, University of Florida, Gainesville, FL

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Goodness of Fit Tests

Goodness of Fit Tests Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Analysis of data in square contingency tables

Analysis of data in square contingency tables Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Lecture 25: Models for Matched Pairs

Lecture 25: Models for Matched Pairs Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture

More information

Ordinal Variables in 2 way Tables

Ordinal Variables in 2 way Tables Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Inferences for Proportions and Count Data

Inferences for Proportions and Count Data Inferences for Proportions and Count Data Corresponds to Chapter 9 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT), with some slides by Ramón V. León (University of Tennessee) 1 Inference

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Stat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016

Stat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016 Stat 5421 Lecture Notes Simple Chi-Square Tests for Contingency Tables Charles J. Geyer March 12, 2016 1 One-Way Contingency Table The data set read in by the R function read.table below simulates 6000

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

WORKSHOP 3 Measuring Association

WORKSHOP 3 Measuring Association WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Aalysis Mahida Samarakoo Jauary 28, 2016 Mahida Samarakoo STAC51: Categorical data Aalysis 1 / 35 Table of cotets Iferece for Proportios 1 Iferece for Proportios Mahida Samarakoo

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information