Goodness of Fit Tests

Size: px

Start display at page:

Download "Goodness of Fit Tests"

Allyson Williamson
6 years ago
Views:

1 Goodness of Fit Tests Marc H. Mehlman University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38

2 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of Independence 3 Test of Homogeneity McNemar Test (Matched Pairs) 4 Chapter #9 R Assignment (University of New Haven) Goodness of Fit Tests 2 / 38

3 Goodness of Fit Chi Squared Test Goodness of Fit Chi Squared Test Goodness of Fit Chi Squared Test (University of New Haven) Goodness of Fit Tests 3 / 38

4 Goodness of Fit Chi Squared Test Idea of the chi-square test The chi-square (χ 2 ) test is used when the data are categorical. It measures how different the observed data are from what we would expect if H 0 was true. Observed sample proportions (1 SRS of 700 births) Expected proportions under H 0 : p 1 =p 2 =p 3 =p 4 =p 5 =p 6 =p 7 =1/7 Sample composition 20% 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Expected composition 20% 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Sat. Sun. (University of New Haven) Goodness of Fit Tests 4 / 38

Goodness of Fit Chi Squared Test The chi-square distributions The χ 2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described

5 Goodness of Fit Chi Squared Test The chi-square distributions The χ 2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom. Published tables & software give the upper-tail area for critical values of many χ 2 distributions. (University of New Haven) Goodness of Fit Tests 5 / 38

6 Goodness of Fit Chi Squared Test Table D Ex: df = 6 If χ 2 = 15.9 the P-value is between p df (University of New Haven) Goodness of Fit Tests 6 / 38

7 Goodness of Fit Chi Squared Test Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n 1, n 2,, n k in k cells. Let H 0 specify the cell probabilities p 1, p 2,, p k for the k possible outcomes. Definition o j def = observed in cell j e j def = np j = expected in cell j Example Three species of large fish (A, B, C) that are native to a certain river have been observed to exist in equal proportions. A recent survey of 300 large fish found 89 of species A, 120 of species B and 91 of species C. What are the observed and expected counts? Solution: o 1 = 89, o 2 = 120 and o 3 = 91. ( ) 1 e 1 = e 2 = e 3 = np j = 300 = (University of New Haven) Goodness of Fit Tests 7 / 38

8 Goodness of Fit Chi Squared Test Chi Squared Goodness of Fit Test Theorem (Chi Squared Goodness of Fit Test) The chi square statistic, which measures how much the observed cell counts differ from the expected cell counts, is Let If H 0 is true and all expected counts are 1 x def = k j=1 (o j e j ) 2 e j. H 0 : the cell probabilities are p 1,, p k. no more than 20% of the expected counts are < 5. then the chi squared statistic is approximately χ 2 (k 1). In that case, the p value of the test H 0 versus H A : not H 0 is approximately P(x C) where C χ 2 (k 1). (University of New Haven) Goodness of Fit Tests 8 / 38

9 Goodness of Fit Chi Squared Test Example River ecology Three species of large fish (A, B, C) that are native to a certain river have been observed to co-exist in equal proportions. A recent random sample of 300 large fish found 89 of species A, 120 of species B, and 91 of species C. Do the data provide evidence that the river s ecosystem has been upset? H 0 : p A = p B = p C = 1/3 H a : H 0 is not true Number of proportions compared: k = 3 All the expected counts are : n / k = 300 / 3 = 100 Degrees of freedom: (k 1) = 3 1 = 2 X 2 calculations: 2 2 ( ) ( ) ( ) 2 χ = = = (University of New Haven) Goodness of Fit Tests 9 / 38

10 Goodness of Fit Chi Squared Test Example (cont.) If H 0 was true, how likely would it be to find by chance a discrepancy between observed and expected frequencies yielding a X 2 value of 6.02 or greater? From Table E, we find 5.99 < X 2 < 7.38, so 0.05 > P > Software gives P-value = Using a typical significance level of 5%, we conclude that the results are significant. We have found evidence that the 3 fish populations are not currently equally represented in this ecosystem (P < 0.05). (University of New Haven) Goodness of Fit Tests 10 / 38

11 Goodness of Fit Chi Squared Test Example (cont.) Interpreting the χ 2 output The individual values summed in the χ 2 statistic are the χ 2 components. When the test is statistically significant, the largest components indicate which condition(s) are most different from the expected H 0. You can also compare the actual proportions qualitatively in a graph. Percent of total. 40% 30% 20% 10% 0% A B C gumpies sticklebarbs spotheads 2 2 ( ) ( ) ( ) 2 χ = = = 6.02 The largest X 2 component, 4.0, is for species B. The increase in species B contributes the most to significance (University of New Haven) Goodness of Fit Tests 11 / 38

12 Goodness of Fit Chi Squared Test Example Goodness of fit for a genetic model Under a genetic model of dominant epistasis, a cross of white and yellow summer squash will yield white, yellow, and green squash with probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1). Suppose we observe the following data: Are they consistent with the genetic model? H 0 : p white = 12/16; p yellow = 3/16; p green = 1/16 H a : H 0 is not true We use H 0 to compute the expected counts for each squash type. (University of New Haven) Goodness of Fit Tests 12 / 38

13 Goodness of Fit Chi Squared Test Example (cont.) We then compute the chi-square statistic: 2 χ = 2 2 ( ) ( ) ( ) χ = Degrees of freedom = k 1 = 2, and X 2 = Using Table D we find P > Software gives P = This is not significant and we fail to reject H 0. The observed data are consistent with a dominant epistatic genetic model (12:3:1). The small observed deviations from the model could simply have arisen from the random sampling process alone. (University of New Haven) Goodness of Fit Tests 13 / 38

14 Goodness of Fit Chi Squared Test Example (cont.) > obs=c(155,40,10) > tprob=c(12/16, 3/16, 1/16) > chisq.test(obs,p=tprob) Chi-squared test for given probabilities data: obs X-squared = , df = 2, p-value = > exp=chisq.test(obs,p=tprob)$expected > exp [1] > (obs-exp)^2/exp [1] (University of New Haven) Goodness of Fit Tests 14 / 38

15 Tests of Independence Tests of Independence Tests of Independence (University of New Haven) Goodness of Fit Tests 15 / 38

16 Tests of Independence r c Contingency Tables Given two different finite partitions of the population, namely {A i } r i=1 and {B j } c j=1. One wants to test if the two partitions are independent: H 0 : P(A i B j ) = P(A i )P(B j ) for every 1 i r and 1 j c versus H A : not H 0. One takes a random sample, x 1,, x n, from the population. Let def o ij = the number of x j s that fall in A i B J and def r def c C j = o ij and R i = o ij. i=1 The data for the test of independence is given in a r c contingency table: (University of New Haven) Goodness of Fit Tests 16 / 38 j=1 B 1 B 2 B c Row Totals A 1 o 11 o 12 o 1C R 1 A 2 o 21 o 22 o 2C R 2... A r o R1 o R2 o RC R r Column Totals C 1 C 2 C c Grand Total = n. The name contingency table was given by Karl Pearson...

17 Tests of Independence Example Two-way tables An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design. High school students were asked whether they smoke, and whether their parents smoke: Second factor: Student smoking status First factor: Parent smoking status (University of New Haven) Goodness of Fit Tests 17 / 38

18 Tests of Independence Example (cont.) student smokes student doesn t smoke Total both parents smoke 400 1,380 1,780 one parent smokes 416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student & one parent smokes) = P(being in row #2 & column #1) 2, 1 entry = grand total = 416 5, 375 = P(student smokes) = P(being in column #1) column #1 total 1, 004 = = grand total 5, 375 = P(one parent smokes) = P(being in row #2) row #2 total 2, 239 = = grand total 5, 375 = (University of New Haven) Goodness of Fit Tests 18 / 38

19 Tests of Independence Expected Counts for r c Contingency Tables Observe: Assuming H 0 : row variable and column variable are independent, e ij = (grand total) P(being in ij th cell) = (grand total) P(being in row #i) P(being in column #j) ( ) ( ) row #i total column #j total = (grand total) grand total grand total (row #i total) (column #j total) =. grand total (University of New Haven) Goodness of Fit Tests 19 / 38

20 Tests of Independence Example (cont.) student smokes student doesn t smoke Total both parents smoke 400 1,380 1,780 one parent smokes 416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 The expected counts of the six cells are: 1, 780 1, 004 1, 780 4, 371 e 11 = = e 12 = = 1, , 375 5, 375 2, 239 1, 004 2, 239 4, 371 e 21 = = e 22 = = 1, , 375 5, 375 1, 356 1, 004 1, 356 4, 371 e 31 = = e 32 = = 1, , 375 5, 375 (University of New Haven) Goodness of Fit Tests 20 / 38

21 Tests of Independence Chi Squared Test for Two Way Tables Theorem (Chi Squared Test for Two Way Tables) The chi square statistic from a two way r c table, x def = r i=1 c j=1 (o ij e ij ) 2 e ij, measures how much the observed cell counts differ from the expected cell counts when holds. If H 0 is true and H 0: row variable and column variable are independent all expected counts are 1 no more than 20% of the expected counts are < 5. then the chi squared statistic is approximately χ 2 ((r 1)(c 1)). In that case, the p value of the test, H 0 versus H A : not H 0 is approximately P(x C) where C χ 2 ((r 1)(c 1)). (University of New Haven) Goodness of Fit Tests 21 / 38

from a random sample of high school students (rows are parental smoking habits, columns are the

22 Tests of Independence Example (cont.) Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students smoking habits). What does it tell you? Sample size? Hypotheses? Are the data ok for a χ 2 test? Interpretation? (University of New Haven) Goodness of Fit Tests 22 / 38

23 Tests of Independence Example (cont.) > row1=c(400,1380) > row2=c(416,1823) > row3=c(188,1168) > obs = rbind(row1,row2,row3) > chisq.test(obs) Pearson s Chi-squared test data: obs X-squared = , df = 2, p-value = 6.959e-09 > exp=chisq.test(obs)$expected > exp [,1] [,2] row row row > (obs-exp)^2/exp [,1] [,2] row row row (University of New Haven) Goodness of Fit Tests 23 / 38

24 Tests of Independence Equivalence of Tests Consider a 2 2 two way table: bad driver good driver male female One can test whether being a bad/good driver has nothing to do with gender by 1 z test for comparing two proportions. 2 Goodness of Fit Chi Squared Test for Independence. Both ways are equivalent and will yield the same result. (University of New Haven) Goodness of Fit Tests 24 / 38

25 Test of Homogeneity Test of Homogeneity Test of Homogeneity (University of New Haven) Goodness of Fit Tests 25 / 38

26 Test of Homogeneity Test of Homogeneity (No Matched Pairs) Definition A test of homogeneity tests if two different populations have the same proportion of some trait, i.e., the corresponding 2 2 contingency table has independent row and column variables. Example Computer chips are manufactured at two different fab plants. Let n def = # computer chips j def = # defective m def = # from fab plant A X def = # defects from fab plant A Question: Does one of the fab plants have a greater chance of creating defects than the other? Consider Fab Plant A Fab Plant B Totals Defective X j X j Nondefective m X n m j + X n j Totals m n m n Notice that with n, m and j fixed, the inner four entries are determined solely by X. (University of New Haven) Goodness of Fit Tests 26 / 38

27 Test of Homogeneity Fisher s Exact Test (No Matched Pairs) Theorem (Fisher s Exact Test) Assume j of n objects are of Type A, the rest are of Type B. Given m of the n objects, one has the hypotheses, Test Statistic: H 0 : the m objects were chosen independent of type from the n objects, versus H 1 : not H 0. X = # of of Type A objects in the set of m objects. HYP(n, j, m) under H 0. Reject H 0 when X takes on extreme values in either tail. The model for X HYP(n, j, m), the hypergeometric distribution is X = # of defective items in a sample of m items chosen from an n items of which j are defective. Note: avoids using chi squared test for 2 by 2 case with small samples. One uses computer programs to calculate p values. (University of New Haven) Goodness of Fit Tests 27 / 38

28 Test of Homogeneity Example of Fisher s Exact Test Example A C. difficile experiment involved 29 patients with inflamed colons. Sixteen where given fecal implants (to introduce beneficial bacteria to the colon) and 13 were were treated with the antibiotic, vancomycin. There were 3 sick and 13 cured fecal transplant patients, and 9 sick and 4 cured vancomycin patients. fecal vancomycin sick 3 9 cured 13 4 Find the p value of H 0 : fecal/vancomycin is independent of sick/cured. Solution: Using R: > fisher.test(rbind(c(3,9),c(13,4))) Fisher s Exact Test for Count Data data: rbind(c(3, 9), c(13, 4)) p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio (University of New Haven) Goodness of Fit Tests 28 / 38

29 Test of Homogeneity Example of Fisher s Exact Test (cont.) Example (cont.) One can also use the hypergeometric distribution. As extreme as 3 or more extreme. > phyper(3,16,13,12) [1] The reason this does not match the p value R gave when using fisher.test is that the fisher.test was a two sided test and above only one extreme side was calculated. Since X HYP(29, 12, 16) is a discrete, non symmetric distribution, it is not trivial to measure the probability of going just as extreme, but big instead of small. A typical way of doing this is to add together the probabilities of all combinations that have lower probabilities than that of the observed data. (University of New Haven) Goodness of Fit Tests 29 / 38

30 Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints Suppose n voters are asked if they would vote for a candidate before a debate and then, again, after the debate. The 2 2 contingency table of the 2n unpaired votes is To test for independence of vote totals: Yes No Before a n a n After b n b n a + b 2n a b 2n H 0 : vote totals were not affected by debate H 1 : vote totals were affected by the debate versus using a χ 2 test with one degree of freedom. If the ratio of before yes votes to votes cast ( a n ) is similar to the ratio of after yes votes to votes cast ( b n ) the χ2 test will conclude the data is consistent with independence of before and after vote tallies. (University of New Haven) Goodness of Fit Tests 30 / 38

31 Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints A second way of thinking of the data is to consider the n paired votes of each of the n voters, (before yes/no, after yes/no). The before and after total vote tallies will remain as before (a and b will be considered fixed). After Yes No Before Yes x a x a No b x n + x b a n a b n b n Notice that given x, the above table is completely determined! Furthermore, the difference along the anti diagonal will be b a no matter what x is. Instead of testing H 0, one tests H 0 : a = b. In other words, the number of yes no voters equals the number of no yes voters the vote tallies for before and after are the same. (University of New Haven) Goodness of Fit Tests 31 / 38

32 Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints Hypothesis H 0 is that the contingency table above be symmetric, not that before/after and yes/no voting tallies be independent. Equivalently, and Yes After (University of New Haven) Goodness of Fit Tests 32 / 38 No Before Yes p 11 p 12 p 11 + p 12 No p 12 p 22 p 12 + p 22 p 11 + p 12 p 12 + p 22 1 H 0 : p 12 = p 21. Independence of the yes/no voting tally variable and the before/after variable is different than independence of the before and after votes of each voter. For instance, if every voter voted the same before and after the debate, then both H 0 and H 0 would hold, yet a n = b n so χ2 test for independence says the data is consistent with independence of before/after voting tallies, but the before and after votes of a voter would be as dependent as they possibly can be (one could predict the after debate vote of a voter knowing the voter s before debate vote).

33 Test of Homogeneity McNemar Test (Matched Pairs) McNemar Test (Matched Pairs) Theorem (McNemar s Test (Quinn McNemar, psychologist (1947))) Let (x 1, y 1 ),, (x n, y n) be a paired random sample where X BIN(1, p X ) and Y BIN(1, p Y ). Define b def = n j=1 For an approximate test x j = # of x j s that equal 1 and c def = n y j = # of y j s that equal 1. j=1 H 0 : frequencies of b and c occur in same proportion assume b + c 10 and use the test statistic c 2 = uses a right tail test. ( b c 1)2 b + c which is χ 2 (1) under H 0. One It is entirely possible for Fisher s Exact Test for independence results in an insignificant result, while McNemar s Test returns a significant result. McNemar s Test tests for symmetry about the diagonal in the contingency table, not independence. (University of New Haven) Goodness of Fit Tests 33 / 38

34 Test of Homogeneity McNemar Test (Matched Pairs) Example Suppose the softness or callousness of hands was tallied in the following table from randomly selected men. Right Hand Soft Callused Left Hand Soft Callused If a person is to have one soft and one calloused hand, is it equally likely that the callused hand be the right or left hand? Use Nemar s Test to get a p value. Solution: Here n = = 408. Using McNemar s Test, c 2 = ( )2 = = Since this is sampled from χ2 (1), one has a p value of and the test is insignificant. One can not reject the hypothesis that it is equally likely that if one has one callused hand and one soft hand, it is equally likely that the callused hand is your left hand instead of right hand. Notice, one can reorganize the data, losing the information of which left hand goes with which right hand, and obtain Soft Callused Right Hands Left Hands Fisher s Exact test produces an p value of One can not reject the hypothesis that handiness and callousness is independent. (University of New Haven) Goodness of Fit Tests 34 / 38

35 Test of Homogeneity McNemar Test (Matched Pairs) Example Notice that a chi square indep test instead of the Fisher s Exact Test yields a p value of The difference is because Fisher s Exact Test is exact, while the chi-squared indep test is approximate. > mcnemar.test(matrix(c(14,63, 58,273),nrow=2)) McNemar s Chi-squared test with continuity correction data: matrix(c(14, 63, 58, 273), nrow = 2) McNemar s chi-squared = , df = 1, p-value = > chisq.test(matrix(c(72,336,77,331),nrow=2),correct=false) # no continuity correction Pearson s Chi-squared test data: matrix(c(72, 336, 77, 331), nrow = 2) X-squared = , df = 1, p-value = > fisher.test(matrix(c(72,336,77,331),nrow=2)) Fisher s Exact Test for Count Data data: matrix(c(72, 336, 77, 331), nrow = 2) p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio (University of New Haven) Goodness of Fit Tests 35 / 38

36 Chapter #9 R Assignment Chapter #9 R Assignment Chapter #10 R Assignment (University of New Haven) Goodness of Fit Tests 36 / 38

37 Chapter #9 R Assignment 1 A car expert claims that 30% of all cars in Johnstown are American made, 35% are Japanese made, 20% are Korean made and 15% are European. Of 156 cars randomly observed in Johnstown, 67 were American, 42 were Japanese, 24 were Korean and 23 were European. Find the p value of a goodness of fit test between the what was expected and what was observed. 2 Senie et al. (1981) investigated the relationship between age and frequency of breast self-examination in a sample of women (Senie, R. T., Rosen, P. P., Lesser, M. L., and Kinne, D. W. Breast self examinations and medical examination relating to breast cancer stage. American Journal of Public Health, 71, ) A summary of the results is presented in the following table: Frequency of breast self examination Age Monthly Occasionally Never under and over From Hand et al., page 307, table 368. Do an independence test to see if age and frequency of breast self examination are independent. (University of New Haven) Goodness of Fit Tests 37 / 38

38 Assignment Chapter #9 R Assignment 3 A particular gene sites in the common housefly is either deemed synonymous if they did not affect amino acids or were deemed replacement if they did. These sites were also deemed polymorphisms if varied among subspecies or were deemed fixed if they did not. The following data was collected: Synonymous Replacement polymorphisms 43 2 fixed 17 7 Find the p value of H 0 synonymous/replacement is independent of polymorphisms/fixed. (University of New Haven) Goodness of Fit Tests 38 / 38

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.