χ 2 (m 1 d) distribution, where d is the number of parameter MLE estimates made.

Size: px

Start display at page:

Download "χ 2 (m 1 d) distribution, where d is the number of parameter MLE estimates made."

Winfred Alvin Sherman
5 years ago
Views:

1 MATH 2 Goodness of Fit Part 1 Let x 1, x 2,..., x n be a random sample of measurements that have a specified range and distribution. Divide the range of measurements into m bins and let f 1,..., f m denote the frequencies of the sample points occurring in each bin. Let e 1,..., e m denote the theoretical number of measurements that should occur in each bin if the sample were a perfect fit of the specified distribution. Then for large samples m ( f x = k e k ) 2 k =1 e k follows an approximate χ 2 (m 1) distribution. This value x is the Pearson Chi-Square Test Statistic. The P -value is always the right-tail value P(χ 2 (m 1) x). If distribution parameters, such as the mean or standard deviation, are not specified but must be estimated from the sample data, then the test statistics follow a χ 2 (m 1 d) distribution, where d is the number of parameter MLE estimates made. Example 1. A Physics department claims that the scores on its standardized tests are uniformly distributed with the same proportions scoring in the ranges A [88, 100] B [75, 88) C [65, 75) D [50, 65) F [0, 50) But over the last two exams, with a total of 240 papers, the distribution of scores was 33 A's, 40 B's, 42 C's, D's, and 77 F's. Is there significant evidence that the grades are not really uniformly distributed? Solution. If the data were uniformly distributed over these 5 bins, then there should be an equal number of scores in each range. So there should be an expected value of e k = 240/5 = in each bin. A B C D F freq: f k exp: e k The Pearson test statistic is 5 ( f x = k e k ) 2 = k =1 e k (33 )2 (40 )2 (42 )2 ( )2 = which should be compared to a χ 2 (5 1 0) = χ 2 (4) distribution. (77 )2

2 If this data really were uniformly distributed as specified with the 5 ranges, then most test statistics would be near the middle of the χ 2 (4) distribution, just like most normal measurements are within a standard deviation of average. If the data really were from a uniformly distributed, then there would be only a small chance of obtaining a large test statistic. But when there are large differences between what should occur e k and what did occur f k, then the test statistic will be large. x So for our test statistic x 24.29, we compute P( χ 2 (4) 24.29), the right-tail probability that becomes the P -value. If the P -value is small (generally less than 0.10), then we have evidence to state that the data did not come from the stated distribution. Using the command χ 2 cdf(24.29, 1E99, 4), we obtain a P -value of about Thus we can say: If the grades really were uniformly distributed, then we would only have about probability of obtaining our frequencies f k from 240 grades that differ so much from the expected values e k in these 5 bins. This very low P -value gives us strong evidence to reject the claim that the grades are uniformly distributed as claimed. Example 2. Among Republicans, reported preferences for the 2016 Presidential election are: Donald Trump Rand Paul Ted Cruz Ben Carson 35% 15% 30% 20% However an independent poll of 900 Republicans gave the following preferences: Donald Trump Rand Paul Ted Cruz Ben Carson Does the survey poll give evidence to reject the reported preferences? Do a chisquare test of fit, give the P -value, and give a conclusion. Solution. If the reported preferences were correct, then the expected numbers of preferences for each candidate with a poll of 900 people would be Trump Paul Cruz Carson e k = 900 pct We now have the actual frequencies f k and the expected results e k assuming the reported preferences were true:

3 Trump Paul Cruz Carson freq: f k exp: e k The Pearson test statistic is x = 4 ( f k e k ) 2 = k =1 e k which should be compared to a χ 2 (4 1 0) = χ 2 (3) distribution. Using the command χ 2 cdf(3.0754, 1E99, 3), we obtain a P -value of about This relatively high P -value means that the data is not a terrible fit of the specified distribution. Thus we can say: If the reported preferences were true, then we would have a 38% chance of obtaining frequencies from 900 people that differ as much as ours do from the expected numbers on these four candidates. We do not have enough evidence to reject the report. Example 3. results: A completely random survey of 200 adults in Kentucky gave the following Smoker Non-Smoker Male Female Use goodness of fit to test the hypothesis that the proportion of smokers is the same among males as among females. Solution. Let p 1 be the true proportion of smokers among males, and let p 2 be the true proportion among females. Then p 1 = P( S M) = and p 2 = P( S F) = These proportions seem very close. Assuming the true proportions p 1 and p 2 84 are equal, then the pooled estimate for the proportion of smokers is p ˆ = 200 = This value of p ˆ = 0.42 gives us one MLE estimate from the data. (The proportion of non-smokers is then automatically about 0.58; it does not count as another additional population estimate.) Because we had a completely random survey (and not pre-stratified according to a known male/female breakdown), we also can estimate the proportions of males/females in the population. In this case, P(F) 104 = 0.52 (and hence P(M) ). Thus, we have another MLE estimate.

4 Now if the true proportion of smokers were the same among males as among females, and is estimated to be about p ˆ = 0.42, then what results should we have expected in our survey? Expected e k S N S N M or M F F Obtained f k S N M F In each of the 4 bins, the difference between expected and actual is So the Pearson test statistic is x = 4 k =1 ( f k e k ) = e k = , which should be compared to a χ 2 (4 1 2) = χ 2 (1) distribution (2 MLE estimates are used). Then using the command χ 2 cdf(.00842, 1E99, 1) we obtain a P -value of about Because of the high P -value, the data is almost a perfect fit of the expected distribution given that the real proportion of smokers is Using the Two-Sided 2-Proportion Z-Test If we test H 0 : p 1 = p 2 with a two-sided alternative H a : p 1 p 2, then we obtain the exact same P -value of In this case, the z test statistic is z = But note that ( ) 2 = , which is the exact value of the Pearson chi-square test statistic. However, by definition, Z 2 = χ 2 (1), when Z ~ N(0, 1). So the goodness of fit test for two proportions is equivalent to the two-sided 2 Proportion Z test.

5 Poisson Fit Test Many phenomena are modeled by a Poisson distribution, often because of empirical evidence, but sometimes just for mathematical simplification. The occurrences also can be distributed spatially, or otherwise, and not just measured during time intervals. Following are some examples of that show how to test whether data actually follows a Poisson distribution. Example 4. streptomycin. distribution? In the bacterium E. coli, a mutant variety is resistant to the drug Do the occurrences of mutant resistant colonies follow a Poisson Experiment: 150 Petri dishes were plated with one million bacteria each. Below are the results on how many dishes formed each number of resistant colonies. # of resistant colonies # of dishes Does the data come from a Poisson distribution? If so, then what is the best estimate for λ? For this λ, what would be the expected number of dishes e k forming each number of resistant colonies in the above table for k = 0, 1, 2,...? Solution. Here we use the MLE estimate of the Poisson average λ, which is the sample average number of resistant colonies that formed. Thus, we have λ ˆ = 150 = Now if λ = 0.46, then for k = 0, 1, 2, 3, we have e k = k e But for the k! last bin, we use e 4 = 150 P( X 4) = 150 e 0 e 1 e 2 e 3. We then have # of resistant colonies # of dishes e k or more Does there appear to be a significant difference between what did occur and what should occur if the distribution really were Poi(0. 46)? We now test with the Pearson test statistic.

6 We now have a test statistic of x = ( f k e k ) 2 = k e k Since we have 5 bins and 1 MLE in use, we use a χ 2 (5 1 1) = χ 2 (3) curve to obtain a P -value of P(χ 2 (3) ) If the data were from a Poi(0. 46) distribution, then we would have a 13.5% chance of obtaining frequencies from 150 observations that differ as much as ours do from expected in the 5 bin ranges. We do not have enough evidence to reject a Poi(0. 46) distribution. Below we do the computations on a TI in order to have less round-off error: Enter range and frequencies 1 VarStats L1, L2 computes x ˆ λ = x = 0.46 Store expected into L3 Stat Edit Must adjust last bin in L3 Edit L3(5) to etc Expected in L3 Compute error terms in test stat Error terms in L4 Compute stats on L4 The sum Σ x is the test stat χ 2 cdf ( , 1E99, 3) P-Value Note: The last bin contributes the most error to the test stat even though it has only one measurement. To avoid this problem, we could combine the last two bins as one bin and then use a χ 2 cdf(2) curve to compute the test stat.

7 Example 5 (Flying-Bomb Hits on London). Consider the statistics of flying-bomb hits in a south London area during Word War II. (R. D. Clarke, An application of the Poisson distribution, Journal of the Institute of Actuaries, 1946) The region was divided into 576 areas of 0.25 square kilometers each. The number of regions receiving k hits, for k = 0, 1, 2,..., were as follows: # of hits received # regions Does the data appear to follow a (spatial) Poisson distribution? Solution. The MLE estimate for λ is the sample average number of hits per 0.25 square kilometers. Thus λ ˆ = 576 = Now letting e k = k e , for k = 0, 1,..., 4 and e 5 = 576 e k (i.e., k! k=0 the remainder of the distribution), we have # of hits received f k e k or more Then summing over these six bins we have x = (e k f k ) 2 k e k Using 6 bins, the test statistic follows a χ 2 (4) distribution; thus, the P -value for the test statistic x = is P(χ 2 (4) ) If the distribution of bomb hits were Poisson, then there would be an 88.26% chance of obtaining a difference between the expected and observed measurements as large as the difference that occurs in our data. The high P -value means that the data is a good fit of the desired distribution.

8 Enter range and frequencies 1 VarStats L1, L2 computes x ˆ λ = x Store expected into L3 Stat Edit Must adjust last bin in L3 Edit L3(5) to etc Expected in L3 Compute error terms in test stat Error terms in L4 Compute stats on L4 The sum Σ x is the test stat P-Value

9 Exercises 1. A random sample of 1000 grades were 312 A s, 208 B s, 202 C s, 99 D s, 179 F s. (a) Test whether or not grades have been assigned according to the following distribution: A 30%, B 25%, C 20%, D 10%, F 15%. (b) Which grade(s) seem to fit the specified distribution (low contribution to test statistic), and which grade(s) seem to be a bad fit (high contribution to test statistic)? 2. In an experiment by Rutherford, Chadwick, and Ellis, (1920), a radioactive substance was observed during 2608 time periods of 7.5 seconds each. The number of alpha particles reaching a Geiger counter was recorded for each time period. The results were as follows: # particle hits # time periods Using 10 or more as the last bin, perform a goodness of fit test for whether the data comes from a Poisson distribution. Define and give an estimate of λ, and give a table of {e k }, the test statistic, and the P -value. Explain your conclusion in detail.

The Chi-Square Distributions

MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness