Chapter 26: Comparing Counts (Chi Square)

Similar documents
Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Categorical Data Analysis. The data are often just counts of how many things each category has.

11-2 Multinomial Experiment

Chi Square Analysis M&M Statistics. Name Period Date

Confidence intervals

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Chapter 10: Chi-Square and F Distributions

Example. χ 2 = Continued on the next page. All cells

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Introduction to Survey Analysis!

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Inferential statistics

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Lecture 41 Sections Mon, Apr 7, 2008

Bag RED ORANGE GREEN YELLOW PURPLE Candies per Bag

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

Solution to Proof Questions from September 1st

Section 5.4. Ken Ueda

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

13.1 Categorical Data and the Multinomial Experiment

THE SAMPLING DISTRIBUTION OF THE MEAN

Descriptive Statistics (And a little bit on rounding and significant digits)

Uni- and Bivariate Power

CS 124 Math Review Section January 29, 2018

Sampling Distribution Models. Chapter 17

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

STA Module 10 Comparing Two Proportions

Psych 230. Psychological Measurement and Statistics

Testing Research and Statistical Hypotheses

Section 4.6 Simple Linear Regression

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

appstats27.notebook April 06, 2017

Ratios, Proportions, Unit Conversions, and the Factor-Label Method

Ch. 11 Inference for Distributions of Categorical Data

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Hypothesis Testing: Chi-Square Test 1

Chapter 27 Summary Inferences for Regression

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Lecture 14, Thurs March 2: Nonlocal Games

Measures of Agreement

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Confidence Intervals

The Multinomial Model

Ch. 7. One sample hypothesis tests for µ and σ

Testing Independence

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Solving with Absolute Value

Lecture 41 Sections Wed, Nov 12, 2008

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

3.2 Probability Rules

Chapter 1 Statistical Inference

LECTURE 15: SIMPLE LINEAR REGRESSION I

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Hypothesis testing. Data to decisions

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

You separate binary numbers into columns in a similar fashion. 2 5 = 32

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

#29: Logarithm review May 16, 2009

One-Way Tables and Goodness of Fit

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Chapter 22. Comparing Two Proportions 1 /29

Probability Distributions

P (A) = P (B) = P (C) = P (D) =

Example. How to Guess What to Prove

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Binary Logistic Regression

Lecture 4: Constructing the Integers, Rationals and Reals

Statistics 1L03 - Midterm #2 Review

Goodness of Fit Tests

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Math 140 Introductory Statistics

Introduction to Algebra: The First Week

Mathematical Notation Math Introduction to Applied Statistics

Math 140 Introductory Statistics

Additional practice with these ideas can be found in the problems for Tintle Section P.1.1

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Do students sleep the recommended 8 hours a night on average?

One-to-one functions and onto functions

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

One-sample categorical data: approximate inference

Two sample Hypothesis tests in R.

Natural deduction for truth-functional logic

CHAPTER 9: HYPOTHESIS TESTING

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

AP Statistics Cumulative AP Exam Study Guide

Independence 1 2 P(H) = 1 4. On the other hand = P(F ) =

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Probability and Discrete Distributions

Transcription:

Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces us into a very binary existence. Life is so much more varied than that! If only there was a way to keep all of those categories Maybe there is but not with confidence intervals. Confidence intervals attempt to estimate a single parameter; a number. We don t have that with a categorical variable (at least, not in an obvious way ). So; no confidence intervals here. A hypothesis test tries to determine if an observed result could reasonably be expected; to determine if there is a significant difference between what was observed and what was expected. Perhaps that might work. We d just need to find some way to measure the difference between an observed distribution and an expected distribution The Problem Measuring the Difference in Distributions Why don t we try? Let s take a simple categorical variable, like favorite (primary) color. Perhaps the observed data might look like this: Table 1 - Observed Primary Colors Color Red Yellow Blue Observed 15 10 0 The expected distribution might look like anything; probably the most popular version might be of no preference all values are equally likely (all colors are equally preferred). In that case, the expected distribution can be found by dividing the sample size by the number of categories. Table - Expected Primary Colors Color Red Yellow Blue Expected 15 15 15 Now how can we distill the difference in these distributions into a single number? Perhaps we could start by finding the differences for each category. Table 3 - Differences (Obs - Exp) Color Red Yellow Blue Difference 0-5 5 What now? We ve got a bunch of deviations (differences from the expectation) that we want to condense into a single number. Previously, we subtracted two values to get a single value (the difference of proportions procedures); now we have more than two things. What s another way to combine lots of numbers into a single number? How about adding them? Sum of Deviations = 0. Well, that s perhaps a bit counter-intuitive (in fact, the sum of deviations will always be zero). It would make sense that a value of zero would mean no difference how could we fix things so HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 1 OF 9

that the only way to get zero is if everything is identical? How could we fix things so that every deviation was positive? I heard someone say absolute value. You re not wrong but it turns out that we really want to avoid absolute value (the reason for this lies in the Calculus that makes all of this work). What s another way to force a number to become non-negative? Nothing? OK try squaring the values. I know, that seems weird again, the reason lies in Calculus. Sum of Squared Deviations = 50. There is another problem not all differences are equally significant. Consider the following observed and expected distributions: Table 4 - Another Example of Observed and Expected Color Red Yellow Blue Observed 150 95 05 Expected 150 100 00 Our measure of difference is the same; but are these differences really as big as those in the original example? 5 out of 15 is a much bigger difference than 5 out of 00! So how can we take that into consideration? How about dividing by the expected value, to make some kind of proportion? Sounds good! We have now created a new statistic, called Chi Square. Equation 1 - The Chi Square Statistic ( ) observed expected χ = expected Well, actually, that s a parameter if we had all the data, then we could calculate the value of this Chi-Square parameter (and no, that doesn t mean that we re going to construct a confidence interval the parameter isn t a real value that has intrinsic value or meaning. It s an abstract measure that we re going to use to measure a difference.). When we use data from a sample, then we ve got a statistic which we ought to denote X, if we re holding to the Greek Roman style that we ve been following so far. Often, though, people don t make a distinction (our textbook in particular lists this as an exception). The Chi-Square Distribution So, we ve got a new random variable. I wonder what its distribution looks like? Well, it depends. The size of the sample determines how unusual a particular difference (value of χ ) is for small samples, it doesn t take much of a difference to be quite unusual; for large samples, it takes a really big difference. Thus, the chi-square distribution has degrees of freedom. This is a fancy (and technical) way of measuring how a distribution changes as the sample size changes (and requires hyperdimensional geometry to fully explain, so don t ask!). So here are some chi-square distributions (these are called density curves, remember?) for various degrees of freedom. HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE OF 9

Figure 1 - Chi Square Distributions As you can see, the distribution is skew right although it becomes less so as the degrees of freedom get larger. The mean turns out to be the same as the degrees of freedom, and the variance is twice the degrees of freedom. To calculate the area under the chi-square curve, you can use a table, or you can use the CDF function on your calculator. It is almost always the case that you will want to calculate the righthand area for a chi-square (the probability of observing a difference as large, or larger ). I should mention that our statistic isn t exactly a chi-square it s just awfully close. As long as certain conditions are met, it should be OK to use the chi-square distribution for this statistic that we ve constructed. Examples [1.] ( 4.19) P χ > when df = 3? 0.388 (from the table: between 0.0 and 0.5) [.] ( 17.14) P χ > when df = 9? 0.0455 (from the table: between 0.05 and 0.050) [3.] ( 13.38) P χ > when df = 14? 0.5079 (from the table: greater than 0.5) The Chi-Square Tests Yes, there are more than one! Goodness of Fit Indications This is what we used to develop the idea. A Goodness of Fit test is used when you are measuring a single categorical variable in a single population, and you re wondering if there is evidence that the observed data follow a particular distribution. HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 3 OF 9

Requirements First of all, we need a random sample from the population (technically, you can also use this for populations but that shouldn t be foremost in your mind). Second, we need for all of the expected values to be at least one. Think about it in the formula for chi-square, we divide by the expected value. If you divide by a number smaller than one, what happens? The result is large larger than the dividend. So very small expected values skew the chisquare that we calculate, and make probability statements (p-value) worthless. Finally, we need most (at least 80%) of the expected values to be at least 5. This test expands our work with proportions what was the requirement for z tests for proportions? n 1 p each had to be at least 10 (or 5). Well, np is the expected number of np and ( ) successes, and n( 1 p) is the expected number of failures. It only makes sense, then, that we want our expected values to be at least 5 the only new part is that most must be at least 5. This condition is not absolute there are other authors that use slightly a different requirement. The one presented here has stood the test of time (since 1954) and works in most cases, so let s just grab on and run with it, shall we? Expected Values For the Goodness of Fit test, an expected distribution must be given. The most common is that there is no preference / no difference in the categories. In this case, divide the sample size by the number of categories to obtain the expected values. Other expected distributions are possible for example, from the work with primary colors above, perhaps we expect 40% to prefer red, 0% to prefer yellow, and 40% to prefer blue. Simply apply these percentages to the sample size to obtain the expected values. DO NOT ROUND! These are expected values it is OK for them to have decimal values! Only the observed values must be integers (since they are counts). Degrees of Freedom For the Goodness of Fit test, the degrees of freedom are df = categories 1. Examples [4.] A professor, concerned about the number of blue M&Ms in his hand, decided to collect some data from a larger bag. Here are the observed counts: Table 5 - Observed Data for Example 4 Color Brown Yellow Red Orange Green Blue Number 15 114 106 51 43 43 Here is the expected distribution (at that time): Table 6 - Expected Percentages for Example 4 Color Brown Yellow Red Orange Green Blue Percent 30 0 0 10 10 10 Is the distribution as the company claims, or is it off? HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 4 OF 9

H 0 : the distribution of colors is as the company claims H a : the distribution of colors is not as the company claims This calls for a Chi-Square test for Goodness of Fit. In order to conduct this test, the sample must be a random sample from the population, all expected values must be at least 1, and most (at least 80%) of the expected values must be at least 5. I ll assume that this sample was obtained randomly. Here is a table of the expected values: Table 7 - Expected Counts for Example 4 Color Number 509 0.3 = 15.7 Brown ( ) Yellow 509( 0.) = 101.8 Red 509( 0.) = 101.8 Orange 509( 0.1) = 50.9 Green 509( 0.1) = 50.9 Blue 509( 0.1) = 50.9 All are at least 5; the requirement is met. I ll use α = 0.05. ( ) ( ) ( ) ( ) obs exp 15 15.7 114 101.8 106 101.8 χ = = + + exp 15.7 101.8 101.8 ( 51 50.9) ( 43 50.9) ( 43 50.9) + + + = 4.091 With 5 P χ > 4.091 0.5364. 50.9 50.9 50.9 If the distribution is as the company claims, then I can expect to find a sample with a chi square value of at least 4.091 in about 53.64% of samples. Since p> α, this happens often enough at the 5% level to attribute to chance. It is not significant, and I fail to reject the null hypothesis. The data do not suggest that the colors aren t as the company claims (there is no evidence to refute the company s claim). df =, ( ) [5.] While playing a game, Zeke finds that he rolls quite a few 1 s enough to make him suspicious and later test the die to see if it is really fair. Here are the results of Zeke s trials: Table 8 - Observed Counts for Example 5 Pips 1 3 4 5 6 # of Rolls 16 9 7 6 8 4 Do the data suggest that this die is not fair? H 0 : the distribution of pips is uniform H a : the distribution of pips is not uniform This calls for a Chi-Square test for Goodness of Fit. In order to conduct this test, the sample must be a random sample from the population, all expected values must be at least 1, and most (at least 80%) of the expected values must be at least 5. HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 5 OF 9

I ll assume that Zeke s rolls constitute a random sample. Since there were 50 rolls, we should expect that each number of pips will appear in about 50 8.33 6 rolls. Table 9 - Expected Counts for Example 5 Pips 1 3 4 5 6 Expected # of Rolls 8.33 8.33 8.33 8.33 8.33 8.33 All expected values are at least five, so the requirement is met. I ll use α = 0.05. ( ) ( ) ( ) ( ) ( ) obs exp 16 8.33 9 8.33 7 8.33 6 8.33 χ = = + + + exp 8.33 8.33 8.33 8.33 ( 8 8.33) ( 4 8.33) + + 10.4. With five degrees of freedom, P( χ > 10.4) 0.0687. 8.33 8.33 If the die is fair, then I can expect a sample with a chi square value of at least 10.4 in about 6.87% of samples. Since p> α, this occurs often enough to attribute to chance at the 5% level. It is not significant; I fail to reject the null hypothesis. The data do not suggest that the die is not fair. Sorry, Zeke you lost fair and square. Homogeneity of Proportions Indications The Homogeneity of Proportions test is used if a single categorical variable is measured in two or more populations. Typically, the variable will be listed horizontally, and each population will occupy one row (giving a two-dimensional table for the data). One good indicator that HOP should be used is equal row totals (or column totals). Requirements The requirements for this test are the same as for the Goodness of Fit test. Isn t that sweet? Expected Values No expected distribution will be given for this test. The expected values are calculated by applying the percentage of the sample in each row to each column total e.g., if 0% of the sample is contained in row one, then 0% of the number in column one are in row one, column one; 0% of the number in column two are in row one, column two; etc. This can be summarized in the following formula: Equation - Expected Value for the HOP test expected value for cell ( i, j ) = Degrees of Freedom ( total in row i)( total in column j) total sample size The degrees of freedom are df = ( rows 1)( columns 1). HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 6 OF 9

Example [6.] As part of a discrimination case, investigators checked the admissions records for female applicants to a University. The investigators were interested in knowing if the admissions and rejections for females to one department (let s call it Department A) were different from the admissions and rejections for another Department (call it C). The results are shown below. Table 10 - Observed Counts for Example 6 Department / Status Admitted Rejected Dept A 89 19 Dept C 0 391 Do these data suggest any difference in the admission / rejection practices of these two departments with respect to female applicants? H 0 : the distribution of admission status is the same for both departments H a : the distribution of admission status is not the same for both departments This calls for a Chi-Square test for Homogeneity of Proportions. This requires a random sample, all expected counts are at least 1, and at least 80% of expected counts are at least five. I ll assume that these women represent a sample of all possible applicants. The expected counts are shown in the table below. Table 11 - Expected Counts for Example 6 Department / Status Admitted Rejected Department A ( 108)( 91) ( 108)( 410) 44.8331 63.1669 701 701 Department C ( 593)( 91) ( 593)( 410) 46.1669 346.8331 701 701 All expected counts are at least five, so the requirements are met. I ll use α = 0.05. ( ) ( ) ( ) obs exp 89 44.83 19 63.17 χ = = + exp 44.83 63.17 ( 0 46.17) ( 391 346.83) + + 85.9614. With one degree of freedom, 46.17 346.83 P χ > 85.9614 0. ( ) If the distribution of admission status is the same for both departments, then I can expect to find a sample with a chi square value of at least 85.9614 in almost no samples. Since p< α, this occurs too rarely to attribute to chance at the 5% level. This is significant; I reject the null hypothesis. The data suggest that the distribution of admission status is not the same for both departments. If you read this problem and thought about another way to answer it then keep reading to the end of this document! HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 7 OF 9

Independence of Variables Indications A Chi-Square test for Independence of Variables is indicated when two categorical variables are measured in a single population. One variable will be labeled horizontally, and the other vertically, producing a table of values. The difference between IOV and HOP is subtle; pay close attention! For HOP, I ask one question in many groups; in IOV, I ask two questions in one group. In the previous example, HOP was used because there was only one variable there was only one piece of information that we gathered from each person ( were you admitted? ). We already knew the departments! Thus, we only asked one question we only measured on variable in two populations. If we had gathered a bunch of female applicants together and asked each woman two questions ( To which department did you apply? and Were you admitted? ) then we would have used the IOV test. The moral of the story: pay close attention to these problems! The differences are subtle. Requirements The requirements are (thankfully) the same. Expected Values The expected values are the same as the HOP case. Degrees of Freedom The degrees of freedom are the same as the HOP case. Now, some of you are wondering, If everything s the same except for the name, why have two tests? The answer is: I don t know, and it doesn t matter (right now). The AP Committee feels that the difference is significant (some textbook authors agree; others do not), so you need to know the difference. If you continue your studies in statistics, perhaps you ll find out why two are needed (or at least used). Example [7.] A professor categorized each of his students by gender and hair color. The results are shown below. Table 1 - Observed Counts for Example 7 Gender / Color Black Brown Red Blonde Male 56 143 34 46 Female 5 143 37 81 Do these data suggest a relationship between gender and hair color? H 0 : Gender and hair color are independent H a : Gender and hair color are not independent HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 8 OF 9

This calls for a Chi-Square test for Independence of Variables. This test requires a random sample, all expected counts are at least 1, and at least 80% of expected counts are at least five. I ll assume that these students are a random sample of all students. The expected values are shown in the table below. Table 13 - Expected Counts for Example 7 Gender / Color ( 79)( 108) Black Brown Red Blonde ( 79)( 86) ( 79)( 71) Male 50.9 134.5 33.5 59 59 59 ( 313)( 108) ( 313)( 86) ( 313)( 71) Female 57.1 151. 37.5 59 59 59 All of the expected values are at least five, so the requirements are met. I ll use α = 0.05. ( 79)( 17) 59 313 17 ( )( ) ( ) ( ) ( ) ( ) ( ) obs exp 56 50.9 143 134.5 34 33.5 46 59.9 χ = = + + + + exp 50.9 134.5 33.5 59.9 ( 5 57.1) ( 143 151.) ( 37 37.5) ( 81 67.1) + + + 7.994. With three degrees of 57.1 151. 37.5 67.1 P χ > 7.994 0.0461. freedom, ( ) 59 59.9 67.1 If gender and hair color are independent, then I can expect to find a sample with a chi square value of at least 7.994 in about 4.61% of samples. Since p< α, this occurs too rarely to attribute to chance at the 5% level. It is significant; I reject the null hypothesis. The data suggest that gender and hair color are not independent. The Connection to Proportions Back in example 6, some of you may have thought Hey, I could test this with a two sample z-test for proportions! and you d be correct. It turns out that chi-square calculations are intimately related to normal (z) calculations (in fact, that s where the true Chi-Square actually comes from). For us, this relationship is most apparent (and useful) when there is one degree of freedom. A Goodness of Fit test with one degree of freedom can also be handled by a one sample z-test for proportions. A Homogeneity (or Independence) test with one degree of freedom can be handled by a two sample z-test for proportions. Don t believe me? Take the data from example 6 and re-express it as the proportion of women admitted to department A and the proportion of women admitted to department C. Calculate the z-statistic for the two-sample test. Square that value. Compare that to the value of chi-square that I calculated in example 6. Notice anything? HOLLOMAN S AP STATISTICS BVD CHAPTER 6, PAGE 9 OF 9