Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Chi-Square Tests for Goodness of Fit and Independence

Chi-Square Tests In this course, we use chi-square tests in two different ways The chi-square test for goodness-of-fit is used to determine whether an observed frequency distribution differs significantly an expected frequency distribution The chi-square test for independence is used to determine whether two categorical (nominal or ordinal) variables exhibit a significant relationship The null hypothesis for both versions of the chi-square test is expressed in terms of expected frequencies The first step in computing the chi-square statistic for either version of the chi-square test is to determine the expected frequencies The second step, computing the chi square statistic, is the same for both versions of the test.

The Chi-Square Test for Goodness-of-Fit The chi-square test for goodness-of-fit uses frequency data from a sample to test hypotheses about the shape or proportions of a population. ach individual in the sample is classified into one category on the scale of measurement. The data, called observed frequencies and denoted as k or f o, simply count how many individuals from the sample are in each category. 3

The Chi-Square Test for Goodness-of-Fit The null hypothesis specifies the proportion of the population that should be in each category. The proportions from the null hypothesis are used to compute expected frequencies that describe how the sample would appear if it were in perfect agreement with the null hypothesis. 4

The Chi-Square Test for Goodness-of-Fit For the goodness of fit test, the expected frequency for each category is obtained from the binomial probability f = [ k] = np (where k = f o is the count of observed items in a category, p is the proportion from the null hypothesis and n is the size of the sample) 5

xample 1: Uniform xpected Frequencies Number of symbols thrown in a game of rock/paper/scissors Symbol Rock Paper Scissors Total Observed 30 1 4 75 xpected 5 5 5 For 3 possible categories, if the player is throwing symbols randomly, then we should expect n/3 occurrences per category. For n = 75, we would expect 5 occurrences per category. 6

xample : Non-uniform xpected Frequencies Official M&M's Color Distribution Color p brown 0.3 red 0. blue 0.1 orange 0.1 green 0.1 yellow 0. Number of M&Ms of each color in a sample bag Color Brown Red Blue Orange Green Yellow Total Observed 14 6 7 4 9 13 53 xpected 15.9 10.6 5.3 5.3 5.3 10.6 7

Binomial Vs. Multinomial Distribution Inference for Categorical Data n! k ( ) = p ( 1 p) P k Binomial PMF ( n k)! k! n k Multinomial PMF n P k k p! kc k1 (,, ) = p 1 c 1 k1!,, kc! c 8

Computing the Chi-Square Statistic Inference for Categorical Data χ ( df ) = ( f f ) O f For the goodness-of-fit test: df = # cells 1 9

Chi-Square Distribution χ 1 χ dchisq(x,k) χ 4 χ 8 10

α = 0.05 χ 0.05 = 11.07 1

( ) p = P χ > χ observed = 0.863 α = 0.05 χ = 6.1 observed χ 0.05 = 11.07 13

( ) p = P χ > χ observed = 0.959 α = 0.05 χ = 6.1 observed χ 0.05 = 11.03 14

( ) p = P χ > χ observed = 0.959 α = 0.05 χ = 6.1 observed χ 0.05 = 11.03 15

xample Official M&M's Color Distribution Color p brown 0.3 red 0. blue 0.1 orange 0.1 green 0.1 yellow 0. Number of M&Ms of each color in a sample bag Color Brown Red Blue Orange Green Yellow Total Observed 14 6 7 4 9 13 53 xpected 15.9 10.6 5.3 5.3 5.3 10.6 χ ( 5) = ( f f ) O f ( 14 15.9) ( 6 10.6) ( 7 5.3) ( 4 5.3) ( 9 5.3) ( 13 10.6) = + + + + + 15.9 10.6 5.3 5.3 5. 3 10. 6 = 6.1 6.1 < 11.07; retain H 0 The frequency distribution of colors in our bag is not significantly different from the distribution expected based on M&M s published proportions 16

The Chi-Square Test for Independence The second chi-square test, the chi-square test for independence, can be used and interpreted in two different ways: 1. Testing hypotheses about the relationship between two variables in a population H 0 : There is no relationship between factor A and factor B (This interpretation is analogous to correlation). Testing hypotheses about differences between proportions for two or more populations. H 0 : There is no difference between the distribution of factor A under different levels of factor B (This interpretation is analogous to interaction in factorial ANOVAs) 17

The Chi-Square Test for Independence The data for a chi-square test for independence are usually organized in a matrix with the categories for one variable defining the rows and the categories of the second variable defining the columns. These matrices are usually called contingency tables 18

The Chi-Square Test for Independence Frequency of successes and relapses for anorexic patients treated with Prozac or a placebo Outcome Treatment Success Relapse Total Drug Placebo Total 13 36 49 14 30 44 7 66 93 19

The Chi-Square Test for Independence The data, called observed frequencies, simply show how many individuals from the sample fall into each cell (i.e., combination of factor levels) of the matrix. The null hypothesis for this test states that there is no relationship between the two variables In other words, the two variables are independent. 0

Computing xpected Frequencies Inference for Categorical Data For the goodness of fit test, the expected frequency is computed as f = np For the test for independence, the expected frequency for each cell in the matrix is computed as f ( ) = n p p = C R ( f f ) C n R 1

To understand the intuition behind this, consider the joint and marginal probabilities underlying the frequency distribution: Outcome Treatment Success Relapse Total Drug Placebo Total P(success,drug) P(relapse,drug) P(drug) P(success,placebo) P(relapse,placebo) P(placebo) P(success) P(relapse) 1 Remember from earlier in the semester that if two factors (treatment & outcome) are independent, then: ( outcome, treatment ) = ( outcome) ( treatment ) P P P

Computing the Chi-Square Statistic Inference for Categorical Data The calculation of chi-square is the same for all chi-square tests: χ = ( f f ) O However, computation of the degrees of freedom differs f For the goodness-of-fit test: df = # cells 1 For the test of independence: (# 1 )(# 1) df = rows cols 3

Full xample vent Frequencies Death Sentence Race Yes No Total Black 95 45 50 Nonblack 19 18 147 Total 114 553 667 vent Probabilities (f/n) Death Sentence Race Yes No Total Black 0.14 0.637 0.780 Nonblack 0.08 0.19 0.0 Total 0.171 0.89 1.000 Are race of the defendant and application of the death sentence independent? P P ( black, death) = 0.14 ( ) P( ) = 0.780 = P( black, death) P( black ) P( death) black death 0.171 0.133 4

Full xample: Observed vent Frequencies f ( black,yes ) Death Sentence Race Yes No Total Black 95 45 50 Nonblack 19 18 147 Total 114 553 667 f f f ( black,no) ( nonblack,yes) ( nonblack,no) 50 114 = = 88.88 667 50 553 = = 431.1 667 147 114 = = 5.1 667 147 553 = = 11.88 667 5

Observed vent Frequencies Death Sentence Race Yes No Total Black 95 45 50 Nonblack 19 18 147 Total 114 553 667 xpected vent Frequencies Death Sentence Race Yes No Total Black 88.88 431.1 50 Nonblack 5.1 11.88 147 Total 114 553 667 χ P ( 1) = ( f f ) O f ( 95 88.88) ( 45 431.1) ( 19 5.1) ( 18 11.88) = + + + 88.88 431.1 5.1 11.88 = 0.4 + 0.09 + 1.49 + 0. 31 =.31 ( ) χ (1) >.31 0.13; retain H 0 These data do not indicate a significant relationship between race and sentencing in death penalty trials. Or, These data are not sufficient to conclude that black defendants are sentenced at a different rate than nonblack defendants in death penalty trials. 7