AP Statistics Chapter 14 Chi- Square Distribution Procedures I. Chi- Square Distribution ( χ 2 ) The chi- square test is used when comparing categorical data or multiple proportions. a. Family of only positive values. b. Each curve begins on 0 and the horizontal axis, increases to a peak, and them approaches the horizontal axis asymptotically from above. c. A chi- square curve is skewed right. As the number of degrees of freedom increase, the curve becomes more and more symmetrical and looks more like a Normal curve. d. The total area under a chi- square curve is equal to 1. II. III. Tests a. Test for Goodness of Fit This is applied when you have one categorical variable with multiple categories from a single population. It is used when trying to determine if one sample is a good fit for the population. It is used to determine whether sample data are consistent with hypothesized distributions. b. Test for Homogeneity It is applied when you have one categorical variable with multiple categories from 2 different popuations and is used to determine whether frequency counts are distribution identically across different populations. c. Test for Independence/Association This is applied when you have two categorical variables with multiple categories from a single population. It is used to determine whether there is a significant association between the variables, i.e., dependent or independent? Conditions a. S - Random (Representative) Sample - simple random sample(s) or a representative sample(s). b. I - Independence - Individual observations are independent. When sampling without replacement, check that the population is at least 10 times as large as the sample (the 10% condition). c. C Counts (Large Sample) - You may use this test with critical values from the chi- square distribution when all individual expected counts are at least 1 and no more than 20% of the expect counts are less than 5. IV. Test for Goodness of Fit (comparing one sample distribution to a population distribution) - is used when comparing a sample distribution to a population distribution. You want to determine if your sample (observed counts) is a good fit to the population distribution (expected counts). Does your sample differ enough from the population distribution to determine if the claimed population distribution is in fact true? A Hypotheses - Test for Goodness of Fit 1. H o :the actual population proportion are equal to the hypothesized sample proportions 2. H a :at least one of the actual population proportions differ from their hypothesized sample proportions B. Conditions 1. S I C (see above)
C. Calculations 1. Observed Count (O) is the number per category that is observed or given. 2. Expected Count (E) n * hypothesized proportion 3. Name of Test 2 2 2 ( observed exp ected) ( O E) 4. Chi- Square Test Statistic - X = = with degrees of exp ected E freedom = # of categories 1. D. P- value = P( χ 2 X 2 ) The P- value is the area under the density curve to the right of X 2. Large values of X 2 are evidence against H o. E. Interpretation a. If we have a small P- Value we will reject the null hypothesis. That is, our sample produces statistically significant evidence to say that the population distribution is no longer true. b. If the test finds statistically significant result, do a follow- up analysis that compares the observed and expected counts and that looks for the largest components of the chi- square statistic. V. Calculator a. Enter observed values in List 1. b. Enter proportions from null hypothesis in List 2. c. List 3 = (List 1)(List 2) χ 2 d. Stats Tests GOF- Test Observed: L1 Expected: L3 df: # categories - 1 Calculate Draw
VI. Examples A. Acme Toy Company Acme Toy Company prints baseball cards. The company claims that 30% of the cards are rookies, 60% veterans, and 10% All- stars. The cards are sold in packages of 100. Suppose a randomly selected package of cards has 50 rookies, 45 veterans, and 5 All- stars. Is this consistent with Acme s claim? Use a 0.05 level of significance.
B. Cell Phones Are you likely to have a motor vehicle collision when using a cell phone? A study of 699 drivers who were using a cell phone when they were involved in a collision examined this question? These drivers made 26, 798 cell phone calls during a 14 month study period. Each of the 699 collisions was classified in various ways. We want to determine if the accidents are equally likely to occur on any day of the week? Here are the counts for each day of the week: Day Sun Mon Tues Wed Thur Fri Sat Total Number 20 133 126 159 136 113 12 699 Explain why is this is a Chi- Square Distribution problem.
C. Biology Grades A biology professor reports that historically grades in her introductory biology course have been distributed as follows: 15% A s, 30% B s, 40% C s, 10% D s and 5% F s. Test an appropriate hypothesis to decide if the professor s most recent grade distribution matches the historical distribution. Give statistical evidence to support your conclusion. Grades in her most recent course were distributed as follows: Grade A B C D F Frequency 89 121 78 25 12
D. The Moose Problem A study was conducted to determine where moose are found in a region containing a large burned area. A map of the study area was partitioned into the following four habitat types. The figure below shows these four habitat types. (1) Inside the burned area, not near the edge of the burned area, (2) Inside the burned area, near the edge, (3) Outside the burned area, near the edge, and (4) Outside the burned area, not near the edge. The proportion of total acreage in each of the habitat types was determined for the study area. Using an aerial survey, moose locations were observed and classified into one of the four habitat types. The results are given in the table below. Habitat Type Proportion of Total Acreage Number of Moose Observed 1 0.340 25 2 0.101 22 3 0.104 30 4 0.455 40 Total 1.000 117 (a) The researchers who are conducting the study expect the number of moose observed in a habitat type to be proportional to the amount of acreage of that type of habitat. Are the data consistent with this expectation? Conduct an appropriate statistical test to support your conclusion. Assume the conditions for inference are met. (b) Relative to the proportion of total acreage, which habitat types did the moose seem to prefer? Explain.