χ 2 (m 1 d) distribution, where d is the number of parameter MLE estimates made.

Similar documents
The Chi-Square Distributions

The Chi-Square Distributions

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Chapter 22. Comparing Two Proportions 1 /29

Poisson population distribution X P(

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Chapter 22. Comparing Two Proportions 1 /30

Section 4.6 Simple Linear Regression

Using Tables and Graphing Calculators in Math 11

Lecture 41 Sections Wed, Nov 12, 2008

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Ch. 11 Inference for Distributions of Categorical Data

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

STAT 328 (Statistical Packages)

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

16.400/453J Human Factors Engineering. Design of Experiments II

Statistics 135 Fall 2008 Final Exam

Interpretation of results through confidence intervals

Ch. 7. One sample hypothesis tests for µ and σ

Sources of randomness

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Random Number Generation. CS1538: Introduction to simulations

Math 152. Rumbos Fall Solutions to Exam #2

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Example. χ 2 = Continued on the next page. All cells

Statistics for Managers Using Microsoft Excel

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Hypothesis Testing with Z and T

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

University of Chicago Graduate School of Business. Business 41000: Business Statistics

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

MATH Section 4.1

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. describes the.

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Psych 230. Psychological Measurement and Statistics

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Binomial and Poisson Probability Distributions

Inferences About Two Population Proportions

ANOVA - analysis of variance - used to compare the means of several populations.

Inferences About Two Proportions

What does a population that is normally distributed look like? = 80 and = 10

ISQS 5349 Final Exam, Spring 2017.

The Poisson Distribution

Paired Samples. Lecture 37 Sections 11.1, 11.2, Robb T. Koether. Hampden-Sydney College. Mon, Apr 2, 2012

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

CBA4 is live in practice mode this week exam mode from Saturday!

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Econ 325: Introduction to Empirical Economics

and the Sample Mean Random Sample

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Hypothesis testing. 1 Principle of hypothesis testing 2

1.3 Exponential Functions

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Two sample Hypothesis tests in R.

Review of One-way Tables and SAS

Chapter 4a Probability Models

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

determine whether or not this relationship is.

Lecture 17 May 11, 2018

Test 3 SOLUTIONS. x P(x) xp(x)

Unit 9: Inferences for Proportions and Count Data

Mt. Douglas Secondary

Math 50: Final. 1. [13 points] It was found that 35 out of 300 famous people have the star sign Sagittarius.

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Macomb Community College Department of Mathematics. Review for the Math 1340 Final Exam

15: CHI SQUARED TESTS

Statistics Handbook. All statistical tables were computed by the author.

Chapter 26: Comparing Counts (Chi Square)

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Chapter 10: Chi-Square and F Distributions

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Prince Sultan University STAT 101 Final Examination Spring Semester 2008, Term 082 Monday, June 29, 2009 Dr. Quazi Abdus Samad

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except on problems 1 & 2. Work neatly.

Unit 14: Nonparametric Statistical Methods

Announcements. Final Review: Units 1-7

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Midterm 1 and 2 results

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Chapter 4. Probability Distributions Continuous

79 Wyner Math Academy I Spring 2016

Chapter 8: Confidence Intervals

Hypothesis testing. Data to decisions

Math 494: Mathematical Statistics

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

MATH20802: STATISTICAL METHODS EXAMPLES

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Chi Square Analysis M&M Statistics. Name Period Date

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

13.1 Categorical Data and the Multinomial Experiment

Transcription:

MATH 2 Goodness of Fit Part 1 Let x 1, x 2,..., x n be a random sample of measurements that have a specified range and distribution. Divide the range of measurements into m bins and let f 1,..., f m denote the frequencies of the sample points occurring in each bin. Let e 1,..., e m denote the theoretical number of measurements that should occur in each bin if the sample were a perfect fit of the specified distribution. Then for large samples m ( f x = k e k ) 2 k =1 e k follows an approximate χ 2 (m 1) distribution. This value x is the Pearson Chi-Square Test Statistic. The P -value is always the right-tail value P(χ 2 (m 1) x). If distribution parameters, such as the mean or standard deviation, are not specified but must be estimated from the sample data, then the test statistics follow a χ 2 (m 1 d) distribution, where d is the number of parameter MLE estimates made. Example 1. A Physics department claims that the scores on its standardized tests are uniformly distributed with the same proportions scoring in the ranges A [88, 100] B [75, 88) C [65, 75) D [50, 65) F [0, 50) But over the last two exams, with a total of 240 papers, the distribution of scores was 33 A's, 40 B's, 42 C's, D's, and 77 F's. Is there significant evidence that the grades are not really uniformly distributed? Solution. If the data were uniformly distributed over these 5 bins, then there should be an equal number of scores in each range. So there should be an expected value of e k = 240/5 = in each bin. A B C D F freq: f k 33 40 42 77 exp: e k The Pearson test statistic is 5 ( f x = k e k ) 2 = k =1 e k (33 )2 (40 )2 (42 )2 ( )2 = 152 8 2 6 2 0 2 29 2 24.29 which should be compared to a χ 2 (5 1 0) = χ 2 (4) distribution. (77 )2

If this data really were uniformly distributed as specified with the 5 ranges, then most test statistics would be near the middle of the χ 2 (4) distribution, just like most normal measurements are within a standard deviation of average. If the data really were from a uniformly distributed, then there would be only a small chance of obtaining a large test statistic. But when there are large differences between what should occur e k and what did occur f k, then the test statistic will be large. x So for our test statistic x 24.29, we compute P( χ 2 (4) 24.29), the right-tail probability that becomes the P -value. If the P -value is small (generally less than 0.10), then we have evidence to state that the data did not come from the stated distribution. Using the command χ 2 cdf(24.29, 1E99, 4), we obtain a P -value of about 0.00007. Thus we can say: If the grades really were uniformly distributed, then we would only have about 0.00007 probability of obtaining our frequencies f k from 240 grades that differ so much from the expected values e k in these 5 bins. This very low P -value gives us strong evidence to reject the claim that the grades are uniformly distributed as claimed. Example 2. Among Republicans, reported preferences for the 2016 Presidential election are: Donald Trump Rand Paul Ted Cruz Ben Carson 35% 15% 30% 20% However an independent poll of 900 Republicans gave the following preferences: Donald Trump Rand Paul Ted Cruz Ben Carson 300 125 290 185 Does the survey poll give evidence to reject the reported preferences? Do a chisquare test of fit, give the P -value, and give a conclusion. Solution. If the reported preferences were correct, then the expected numbers of preferences for each candidate with a poll of 900 people would be Trump Paul Cruz Carson e k = 900 pct 315 135 270 180 We now have the actual frequencies f k and the expected results e k assuming the reported preferences were true:

Trump Paul Cruz Carson freq: f k 300 125 290 185 exp: e k 315 135 270 180 The Pearson test statistic is x = 4 ( f k e k ) 2 = 152 315 102 135 202 270 52 180 3.0754 k =1 e k which should be compared to a χ 2 (4 1 0) = χ 2 (3) distribution. Using the command χ 2 cdf(3.0754, 1E99, 3), we obtain a P -value of about 0.38. This relatively high P -value means that the data is not a terrible fit of the specified distribution. Thus we can say: If the reported preferences were true, then we would have a 38% chance of obtaining frequencies from 900 people that differ as much as ours do from the expected numbers on these four candidates. We do not have enough evidence to reject the report. Example 3. results: A completely random survey of 200 adults in Kentucky gave the following Smoker Non-Smoker Male 40 56 96 Female 44 60 104 84 116 200 Use goodness of fit to test the hypothesis that the proportion of smokers is the same among males as among females. Solution. Let p 1 be the true proportion of smokers among males, and let p 2 be the true proportion among females. Then p 1 = P( S M) = 40 96 0.417 and p 2 = P( S F) = 44 104 0.423. These proportions seem very close. Assuming the true proportions p 1 and p 2 84 are equal, then the pooled estimate for the proportion of smokers is p ˆ = 200 = 0.42. This value of p ˆ = 0.42 gives us one MLE estimate from the data. (The proportion of non-smokers is then automatically about 0.58; it does not count as another additional population estimate.) Because we had a completely random survey (and not pre-stratified according to a known male/female breakdown), we also can estimate the proportions of males/females in the population. In this case, P(F) 104 = 0.52 (and hence P(M) 200 0.). Thus, we have another MLE estimate.

Now if the true proportion of smokers were the same among males as among females, and is estimated to be about p ˆ = 0.42, then what results should we have expected in our survey? Expected e k S N S N M 0.42 96 0.58 96 or M 40.32 55.68 F 0.42 104 0.58 104 F 43.68 60.32 Obtained f k S N M 40 56 F 44 60 In each of the 4 bins, the difference between expected and actual is 0.32. So the Pearson test statistic is x = 4 k =1 ( f k e k ) 2 0.32 2 = e k 40.32 0.322 55.68 0.322 43.68 0.322 60.32 = 0.00842, which should be compared to a χ 2 (4 1 2) = χ 2 (1) distribution (2 MLE estimates are used). Then using the command χ 2 cdf(.00842, 1E99, 1) we obtain a P -value of about 0.926885. Because of the high P -value, the data is almost a perfect fit of the expected distribution given that the real proportion of smokers is 0.42. Using the Two-Sided 2-Proportion Z-Test If we test H 0 : p 1 = p 2 with a two-sided alternative H a : p 1 p 2, then we obtain the exact same P -value of 0.926885. In this case, the z test statistic is z = 0.0917643617. But note that ( 0.0917643617) 2 = 0.00842, which is the exact value of the Pearson chi-square test statistic. However, by definition, Z 2 = χ 2 (1), when Z ~ N(0, 1). So the goodness of fit test for two proportions is equivalent to the two-sided 2 Proportion Z test.

Poisson Fit Test Many phenomena are modeled by a Poisson distribution, often because of empirical evidence, but sometimes just for mathematical simplification. The occurrences also can be distributed spatially, or otherwise, and not just measured during time intervals. Following are some examples of that show how to test whether data actually follows a Poisson distribution. Example 4. streptomycin. distribution? In the bacterium E. coli, a mutant variety is resistant to the drug Do the occurrences of mutant resistant colonies follow a Poisson Experiment: 150 Petri dishes were plated with one million bacteria each. Below are the results on how many dishes formed each number of resistant colonies. # of resistant colonies # of dishes 0 98 1 40 2 8 3 3 4 1 Does the data come from a Poisson distribution? If so, then what is the best estimate for λ? For this λ, what would be the expected number of dishes e k forming each number of resistant colonies in the above table for k = 0, 1, 2,...? Solution. Here we use the MLE estimate of the Poisson average λ, which is the sample average number of resistant colonies that formed. Thus, we have λ ˆ 0 98 1 40 2 8 3 3 1 4 = 150 = 0.46. Now if λ = 0.46, then for k = 0, 1, 2, 3, we have e k = 150 0. 46k e 0.46. But for the k! last bin, we use e 4 = 150 P( X 4) = 150 e 0 e 1 e 2 e 3. We then have # of resistant colonies # of dishes e k 0 98 94.693 1 40 43.559 2 8 10.018 3 3 1.5362 4 or more 1 0.1938 Does there appear to be a significant difference between what did occur and what should occur if the distribution really were Poi(0. 46)? We now test with the Pearson test statistic.

We now have a test statistic of x = ( f k e k ) 2 = 3.307 2 k e k 94.693 3.5592 43.559 2. 0182 10.018 1.46382 1.5362 0.80622 0.1938 5.56135. Since we have 5 bins and 1 MLE in use, we use a χ 2 (5 1 1) = χ 2 (3) curve to obtain a P -value of P(χ 2 (3) 5.56135) 0.135. If the data were from a Poi(0. 46) distribution, then we would have a 13.5% chance of obtaining frequencies from 150 observations that differ as much as ours do from expected in the 5 bin ranges. We do not have enough evidence to reject a Poi(0. 46) distribution. Below we do the computations on a TI in order to have less round-off error: Enter range and frequencies 1 VarStats L1, L2 computes x ˆ λ = x = 0.46 Store expected into L3 Stat Edit Must adjust last bin in L3 Edit L3(5) to 150 94.693 43.559 etc Expected in L3 Compute error terms in test stat Error terms in L4 Compute stats on L4 The sum Σ x is the test stat χ 2 cdf (5. 5615885, 1E99, 3) P-Value Note: The last bin contributes the most error to the test stat even though it has only one measurement. To avoid this problem, we could combine the last two bins as one bin and then use a χ 2 cdf(2) curve to compute the test stat.

Example 5 (Flying-Bomb Hits on London). Consider the statistics of flying-bomb hits in a south London area during Word War II. (R. D. Clarke, An application of the Poisson distribution, Journal of the Institute of Actuaries, 1946) The region was divided into 576 areas of 0.25 square kilometers each. The number of regions receiving k hits, for k = 0, 1, 2,..., were as follows: # of hits received # regions 0 229 1 211 2 93 3 35 4 7 7 1 Does the data appear to follow a (spatial) Poisson distribution? Solution. The MLE estimate for λ is the sample average number of hits per 0.25 square kilometers. Thus λ ˆ 1 211 2 93 3 35 4 7 7 1 = 576 = 0.9323. Now letting e k = 576 0.9323k e 0.9323 4, for k = 0, 1,..., 4 and e 5 = 576 e k (i.e., k! k=0 the remainder of the distribution), we have # of hits received f k e k 0 229 226.74 1 211 211.39 2 93 98.54 3 35 30.62 4 7 7.14 5 or more 1 1.57 Then summing over these six bins we have x = (e k f k ) 2 k e 1.172458. k Using 6 bins, the test statistic follows a χ 2 (4) distribution; thus, the P -value for the test statistic x = 1.172458 is P(χ 2 (4) 1.172458) 0.8826. If the distribution of bomb hits were Poisson, then there would be an 88.26% chance of obtaining a difference between the expected and observed measurements as large as the difference that occurs in our data. The high P -value means that the data is a good fit of the desired distribution.

Enter range and frequencies 1 VarStats L1, L2 computes x ˆ λ = x 0. 9323 Store expected into L3 Stat Edit Must adjust last bin in L3 Edit L3(5) to 576 226.74 211.39 etc Expected in L3 Compute error terms in test stat Error terms in L4 Compute stats on L4 The sum Σ x is the test stat P-Value

Exercises 1. A random sample of 1000 grades were 312 A s, 208 B s, 202 C s, 99 D s, 179 F s. (a) Test whether or not grades have been assigned according to the following distribution: A 30%, B 25%, C 20%, D 10%, F 15%. (b) Which grade(s) seem to fit the specified distribution (low contribution to test statistic), and which grade(s) seem to be a bad fit (high contribution to test statistic)? 2. In an experiment by Rutherford, Chadwick, and Ellis, (1920), a radioactive substance was observed during 2608 time periods of 7.5 seconds each. The number of alpha particles reaching a Geiger counter was recorded for each time period. The results were as follows: # particle hits 0 1 2 3 4 5 6 7 8 9 10 12 # time periods 57 203 383 525 532 408 273 139 45 27 12 4 Using 10 or more as the last bin, perform a goodness of fit test for whether the data comes from a Poisson distribution. Define and give an estimate of λ, and give a table of {e k }, the test statistic, and the P -value. Explain your conclusion in detail.