The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining whether the members of a population are distributed among more than two subcategories according to a hypothesized distribution of percentages. To gather evidence for this, we select a sample from the population and determine how many members of the sample fall into each of the subcategories, and test to see whether the data provide a good fit to the hypothesized distribution. This is why we call it a goodness-of-fit test. The data we collect to study this is a sample of individuals from our population. Each member of the sample falls into exactly one of multiple (more than two) possible categories with respect to a certain categorical variable; we then note the observed cell counts: how many individuals fall into each category. 1
As with our earlier hypothesis tests, there are important independence assumptions and sample size conditions that must be met in order for the test to be reliably carried out... Randomization Condition: the sample should be a SRS 10% Condition: the sample is not so large that it is more than 10% of the size of the population Expected Cell Frequency Condition: the sample is large enough so that each cell contains at least 5 individuals (an analogue of the Success/Failure condition for the situation with only two categories) Suppose that there are k different categories and that the hypothesized distribution of percentages is: p 1 for the first category, p 2 for the second category, etc. If we find the observed cell counts from the sample, we want to compare them against the expected cell counts determined from the null hypothesis H 0 : categories fit the distribution p 1, p 2,..., p k Of course, the alternative hypothesis is H A : categories don t fit the distribution p 1, p 2,..., p k 2
If the null hypothesis is true and our sample size is n, then the expected cell count for the first category should be Exp 1 = p 1 n, the expected cell count for the second category should be Exp 2 = p 2 n, and so on. The test statistic we use is called a chi-square statistic (denoted χ 2 ). It evaluates the discrepancy between the observed cell counts Obs 1, Obs 2,..., Obs k from the data and the expected counts Exp 1, Exp 2,..., Exp k that depend on the hypothesis that χ 2 = (Obs Exp) 2 Exp all cells The chi-square variable is a random variable that follows a special chi-square probability distribution, which, like the t distribution, depends on the number of degrees of freedom df = n 1. [TI-83: STAT DISTR χ 2 -cdf(low, high, df) ] 3
Goodness-of-fit test State hypotheses: Null hypothesis: H 0 : categories fit the distribution p 1, p 2,..., p k Alternative hypothesis: H A : categories don t fit the distribution p 1, p 2,..., p k (Note: There is only one option for H A!) Choose model: A SRS of size n satisfying the 10% Condition is selected and observed counts for each category are tallied (so that each cell count is 5), so a chi-square model applies with test statistic χ 2 = (Obs Exp) 2 and df = n 1 Exp all cells Mechanics: Compute the expected counts, χ 2, and the upper-tailed probability associated with H A : P = P (X 2 χ 2 H 0 ). Conclusion: Assess evidence to reject (or fail to reject) H 0 depending on how small P is. [TI-84: STAT TESTS χ 2 -GOFTest] 4
Test for homogeneity A common variant of the goodness-of-fit test looks to compare the distribution of percentages of certain categories across multiple populations to see whether there is evidence to claim a difference in the distributions of the categories among the populations. Here, the null hypothesis is an assumption of homogeneity among the populations: that is, we assume that the percentage distribution of the categories is the same for each population. The assumptions and necessary conditions for use of a test for homogeneity are identical with those for a goodnessof-fit test (Randomization Condition, 10% Condition, Expected Cell Frequency Condition), but now, since we need to count categories across multiple groups, we arrange the data in a contingency table with the categories arranged by row and the different groups by column. Each cell records an Obs count. If our null hypothesis that all the groups have the same distribution across categories is true, then we expect each Obs count in the table to be close to the expected count Exp = (row total) (column total). table total 5
We then use a chi-square statistic to assess how likely it is that the Obs data differs from their Exp values: χ 2 = (Obs Exp) 2 Exp all cells Here, however, we use a number of degrees of freedom equal to df = (R 1)(C 1), where the contingency table has R rows and C columns. The chi-square probability model now allows us to determine a P -value associated with the test. 6
Test for homogeneity State hypotheses: Null hypothesis: H 0 :C groups have equal distributions over R categories Alternative hypothesis: H A : groups have unequal distributions over categories Choose model: A SRS of size n satisfying the 10% Condition is selected and observed counts for each category are tallied (so that each cell count is 5), so a chi-square model applies with test statistic χ 2 = (Obs Exp) 2 and df = (R 1)(C 1) Exp all cells Mechanics: Compute the expected counts, χ 2, and the upper-tailed probability associated with H A : P = P (X 2 χ 2 H 0 ). Conclusion: Assess evidence to reject (or fail to reject) H 0 depending on how small P is. [TI-83: MATRX EDIT [A], STAT TESTS χ 2 -Test] 7
Chi-square residuals Whenever we decide to reject the null hypothesis, it is a good idea to examine the chi-square standardized residuals for each cell; these can often provide information about the underlying patterns of difference between categories in the case of a goodness-of-fit test, or of differences between groups in the case of a test for homogeneity. The residuals are the square roots of the components that sum to χ 2 ; in particular, if a cell has observed count Obs and expected count Exp, then its residual is Obs Exp c =. Exp Since these are standardized values, cells whose observed counts are much larger than expected will have large positive residuals and cells whose observed counts are much smaller than expected will have large negative residuals. 8
Test for independence We have used contingency tables to investigate the association between two categorical variables. We can use a chi-square model to develop a test for independence of two variables. We use the contingency table data to evaluate the hypothesis of independence between the row categorical variable and the column categorical variable. If this hypothesis is true, then we should observe no association between the variables, so that the Obs values in each cell should be close to the Exp values (as determined for a test for homogeneity): Exp = (row total) (column total). table total The same assumptions and conditions apply as for a test for homogeneity. 9
Test for independence of two categorical variables State hypotheses: Null hypothesis: H 0 :C the two categorical variables are independent Alternative hypothesis: H A : the two categorical variables are not independent Choose model: A SRS of size n satisfying the 10% Condition is selected and observed counts for each category are tallied (so that each cell count is 5), so a chi-square model applies with test statistic χ 2 = (Obs Exp) 2 and df = (R 1)(C 1) Exp all cells Mechanics: Compute the expected counts, χ 2, and the upper-tailed probability associated with H A : P = P (X 2 χ 2 H 0 ). Conclusion: Assess evidence to reject (or fail to reject) H 0 depending on how small P is. [TI-83: MATRX EDIT [A], STAT TESTS χ 2 -Test] 10
Deducing causation Dependence of one variable on another means that there is an association between them. It also suggests that there might be a causal link between them, but it does not prove that such a link must exist. So when you reject a null hypothesis that claims the independence of two categorical variables, that doesn t mean that there is a causal link between them. Examining the residuals can help to tease out any underlying patterns that exist. 11