Goodness of Fit Tests

Similar documents
Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

13.1 Categorical Data and the Multinomial Experiment

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Chapter 26: Comparing Counts (Chi Square)

Chapter 10: Chi-Square and F Distributions

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

11-2 Multinomial Experiment

Lecture 28 Chi-Square Analysis

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Lecture 5: ANOVA and Correlation

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Lecture 7: Hypothesis Testing and ANOVA

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Topic 21 Goodness of Fit

Testing Independence

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

STAC51: Categorical data Analysis

Econ 325: Introduction to Empirical Economics

Two-sample Categorical data: Testing

An introduction to biostatistics: part 1

Psych 230. Psychological Measurement and Statistics

Inferential statistics

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Lecture 45 Sections Wed, Nov 19, 2008

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Ling 289 Contingency Table Statistics

Module 10: Analysis of Categorical Data Statistics (OA3102)

Section 4.6 Simple Linear Regression

Statistics - Lecture 04

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

MA : Introductory Probability

Statistics in medicine

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

POLI 443 Applied Political Research

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

The Chi-Square Distributions

Chi-Square Analyses Stat 251

Hypothesis Testing: Chi-Square Test 1

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Inferences About Two Proportions

10.4 Hypothesis Testing: Two Independent Samples Proportion

Binary Logistic Regression

Binomial and Poisson Probability Distributions

Unit 9: Inferences for Proportions and Count Data

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Formulas and Tables by Mario F. Triola

Statistics 3858 : Contingency Tables

Log-linear Models for Contingency Tables

Categorical Variables and Contingency Tables: Description and Inference

Unit 9: Inferences for Proportions and Count Data

Analysis of Categorical Data Three-Way Contingency Table

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

Example. χ 2 = Continued on the next page. All cells

Discrete Distributions

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Goodness of Fit Goodness of fit - 2 classes

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

15: CHI SQUARED TESTS

The t-statistic. Student s t Test

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Quantitative Analysis and Empirical Methods

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Mathematical Notation Math Introduction to Applied Statistics

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

10: Crosstabs & Independent Proportions

STP 226 ELEMENTARY STATISTICS NOTES

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Inference for Proportions

Chapter Six: Two Independent Samples Methods 1/51

Goodness of Fit Tests: Homogeneity

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Multiple Sample Categorical Data

UNIT 5 ~ Probability: What Are the Chances? 1

Chapter Eight: Assessment of Relationships 1/42

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

UNIVERSITY OF TORONTO Faculty of Arts and Science

Experiment -- the process by which an observation is made. Sample Space -- ( S) the collection of ALL possible outcomes of an experiment

10.2: The Chi Square Test for Goodness of Fit

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

MATH Notebook 3 Spring 2018

Analysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Chapter 11. Hypothesis Testing (II)

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Transcription:

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38

Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of Independence 3 Test of Homogeneity McNemar Test (Matched Pairs) 4 Chapter #9 R Assignment (University of New Haven) Goodness of Fit Tests 2 / 38

Goodness of Fit Chi Squared Test Goodness of Fit Chi Squared Test Goodness of Fit Chi Squared Test (University of New Haven) Goodness of Fit Tests 3 / 38

Goodness of Fit Chi Squared Test Idea of the chi-square test The chi-square (χ 2 ) test is used when the data are categorical. It measures how different the observed data are from what we would expect if H 0 was true. Observed sample proportions (1 SRS of 700 births) Expected proportions under H 0 : p 1 =p 2 =p 3 =p 4 =p 5 =p 6 =p 7 =1/7 Sample composition 20% 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Expected composition 20% 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Sat. Sun. (University of New Haven) Goodness of Fit Tests 4 / 38

Goodness of Fit Chi Squared Test The chi-square distributions The χ 2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom. Published tables & software give the upper-tail area for critical values of many χ 2 distributions. (University of New Haven) Goodness of Fit Tests 5 / 38

Goodness of Fit Chi Squared Test Table D Ex: df = 6 If χ 2 = 15.9 the P-value is between 0.01 0.02. p df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005 1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12 2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20 3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73 4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00 5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11 6 7.84 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55 20.25 22.46 24.10 7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02 8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87 9 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67 10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42 11 13.70 14.63 15.77 17.28 19.68 21.92 22.62 24.72 26.76 28.73 31.26 33.14 12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82 13 15.98 16.98 18.20 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 36.48 14 17.12 18.15 19.41 21.06 23.68 26.12 26.87 29.14 31.32 33.43 36.12 38.11 15 18.25 19.31 20.60 22.31 25.00 27.49 28.26 30.58 32.80 34.95 37.70 39.72 16 19.37 20.47 21.79 23.54 26.30 28.85 29.63 32.00 34.27 36.46 39.25 41.31 17 20.49 21.61 22.98 24.77 27.59 30.19 31.00 33.41 35.72 37.95 40.79 42.88 18 21.60 22.76 24.16 25.99 28.87 31.53 32.35 34.81 37.16 39.42 42.31 44.43 19 22.72 23.90 25.33 27.20 30.14 32.85 33.69 36.19 38.58 40.88 43.82 45.97 20 23.83 25.04 26.50 28.41 31.41 34.17 35.02 37.57 40.00 42.34 45.31 47.50 21 24.93 26.17 27.66 29.62 32.67 35.48 36.34 38.93 41.40 43.78 46.80 49.01 22 26.04 27.30 28.82 30.81 33.92 36.78 37.66 40.29 42.80 45.20 48.27 50.51 23 27.14 28.43 29.98 32.01 35.17 38.08 38.97 41.64 44.18 46.62 49.73 52.00 24 28.24 29.55 31.13 33.20 36.42 39.36 40.27 42.98 45.56 48.03 51.18 53.48 25 29.34 30.68 32.28 34.38 37.65 40.65 41.57 44.31 46.93 49.44 52.62 54.95 26 30.43 31.79 33.43 35.56 38.89 41.92 42.86 45.64 48.29 50.83 54.05 56.41 27 31.53 32.91 34.57 36.74 40.11 43.19 44.14 46.96 49.64 52.22 55.48 57.86 28 32.62 34.03 35.71 37.92 41.34 44.46 45.42 48.28 50.99 53.59 56.89 59.30 29 33.71 35.14 36.85 39.09 42.56 45.72 46.69 49.59 52.34 54.97 58.30 60.73 30 34.80 36.25 37.99 40.26 43.77 46.98 47.96 50.89 53.67 56.33 59.70 62.16 40 45.62 47.27 49.24 51.81 55.76 59.34 60.44 63.69 66.77 69.70 73.40 76.09 50 56.33 58.16 60.35 63.17 67.50 71.42 72.61 76.15 79.49 82.66 86.66 89.56 60 66.98 68.97 71.34 74.40 79.08 83.30 84.58 88.38 91.95 95.34 99.61 102.70 80 88.13 90.41 93.11 96.58 101.90 106.60 108.10 112.30 116.30 120.10 124.80 128.30 100 109.10 111.70 114.70 118.50 124.30 129.60 131.10 135.80 140.20 144.30 149.40 153.20 (University of New Haven) Goodness of Fit Tests 6 / 38

Goodness of Fit Chi Squared Test Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n 1, n 2,, n k in k cells. Let H 0 specify the cell probabilities p 1, p 2,, p k for the k possible outcomes. Definition o j def = observed in cell j e j def = np j = expected in cell j Example Three species of large fish (A, B, C) that are native to a certain river have been observed to exist in equal proportions. A recent survey of 300 large fish found 89 of species A, 120 of species B and 91 of species C. What are the observed and expected counts? Solution: o 1 = 89, o 2 = 120 and o 3 = 91. ( ) 1 e 1 = e 2 = e 3 = np j = 300 = 100. 3 (University of New Haven) Goodness of Fit Tests 7 / 38

Goodness of Fit Chi Squared Test Chi Squared Goodness of Fit Test Theorem (Chi Squared Goodness of Fit Test) The chi square statistic, which measures how much the observed cell counts differ from the expected cell counts, is Let If H 0 is true and all expected counts are 1 x def = k j=1 (o j e j ) 2 e j. H 0 : the cell probabilities are p 1,, p k. no more than 20% of the expected counts are < 5. then the chi squared statistic is approximately χ 2 (k 1). In that case, the p value of the test H 0 versus H A : not H 0 is approximately P(x C) where C χ 2 (k 1). (University of New Haven) Goodness of Fit Tests 8 / 38

Goodness of Fit Chi Squared Test Example River ecology Three species of large fish (A, B, C) that are native to a certain river have been observed to co-exist in equal proportions. A recent random sample of 300 large fish found 89 of species A, 120 of species B, and 91 of species C. Do the data provide evidence that the river s ecosystem has been upset? H 0 : p A = p B = p C = 1/3 H a : H 0 is not true Number of proportions compared: k = 3 All the expected counts are : n / k = 300 / 3 = 100 Degrees of freedom: (k 1) = 3 1 = 2 X 2 calculations: 2 2 ( 89 100) ( 120 100) ( 91 100) 2 χ = + 100 100 = 1.21+ 4.0 + 0.81 = 6.02 + 100 2 (University of New Haven) Goodness of Fit Tests 9 / 38

Goodness of Fit Chi Squared Test Example (cont.) If H 0 was true, how likely would it be to find by chance a discrepancy between observed and expected frequencies yielding a X 2 value of 6.02 or greater? From Table E, we find 5.99 < X 2 < 7.38, so 0.05 > P > 0.025 Software gives P-value = 0.049 Using a typical significance level of 5%, we conclude that the results are significant. We have found evidence that the 3 fish populations are not currently equally represented in this ecosystem (P < 0.05). (University of New Haven) Goodness of Fit Tests 10 / 38

Goodness of Fit Chi Squared Test Example (cont.) Interpreting the χ 2 output The individual values summed in the χ 2 statistic are the χ 2 components. When the test is statistically significant, the largest components indicate which condition(s) are most different from the expected H 0. You can also compare the actual proportions qualitatively in a graph. Percent of total. 40% 30% 20% 10% 0% A B C gumpies sticklebarbs spotheads 2 2 ( 89 100) ( 120 100) ( 91 100) 2 χ = + 100 100 = 1.21+ 4.0 + 0.81 = 6.02 The largest X 2 component, 4.0, is for species B. The increase in species B contributes the most to significance. + 100 2 (University of New Haven) Goodness of Fit Tests 11 / 38

Goodness of Fit Chi Squared Test Example Goodness of fit for a genetic model Under a genetic model of dominant epistasis, a cross of white and yellow summer squash will yield white, yellow, and green squash with probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1). Suppose we observe the following data: Are they consistent with the genetic model? H 0 : p white = 12/16; p yellow = 3/16; p green = 1/16 H a : H 0 is not true We use H 0 to compute the expected counts for each squash type. (University of New Haven) Goodness of Fit Tests 12 / 38

Goodness of Fit Chi Squared Test Example (cont.) We then compute the chi-square statistic: 2 χ = 2 2 ( 155 153.75) ( 40 38.4375) ( 10 12.8125) 153.75 + 38.4375 + 12.8125 2 2 χ 0.01016 0.06352 0.61738 0.69106 = 0.069106 Degrees of freedom = k 1 = 2, and X 2 = 0.691. Using Table D we find P > 0.25. Software gives P = 0.708. This is not significant and we fail to reject H 0. The observed data are consistent with a dominant epistatic genetic model (12:3:1). The small observed deviations from the model could simply have arisen from the random sampling process alone. (University of New Haven) Goodness of Fit Tests 13 / 38

Goodness of Fit Chi Squared Test Example (cont.) > obs=c(155,40,10) > tprob=c(12/16, 3/16, 1/16) > chisq.test(obs,p=tprob) Chi-squared test for given probabilities data: obs X-squared = 0.6911, df = 2, p-value = 0.7078 > exp=chisq.test(obs,p=tprob)$expected > exp [1] 153.7500 38.4375 12.8125 > (obs-exp)^2/exp [1] 0.01016260 0.06351626 0.61737805 (University of New Haven) Goodness of Fit Tests 14 / 38

Tests of Independence Tests of Independence Tests of Independence (University of New Haven) Goodness of Fit Tests 15 / 38

Tests of Independence r c Contingency Tables Given two different finite partitions of the population, namely {A i } r i=1 and {B j } c j=1. One wants to test if the two partitions are independent: H 0 : P(A i B j ) = P(A i )P(B j ) for every 1 i r and 1 j c versus H A : not H 0. One takes a random sample, x 1,, x n, from the population. Let def o ij = the number of x j s that fall in A i B J and def r def c C j = o ij and R i = o ij. i=1 The data for the test of independence is given in a r c contingency table: (University of New Haven) Goodness of Fit Tests 16 / 38 j=1 B 1 B 2 B c Row Totals A 1 o 11 o 12 o 1C R 1 A 2 o 21 o 22 o 2C R 2... A r o R1 o R2 o RC R r Column Totals C 1 C 2 C c Grand Total = n. The name contingency table was given by Karl Pearson...

Tests of Independence Example Two-way tables An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design. High school students were asked whether they smoke, and whether their parents smoke: Second factor: Student smoking status First factor: Parent smoking status 400 1380 416 1823 188 1168 (University of New Haven) Goodness of Fit Tests 17 / 38

Tests of Independence Example (cont.) student smokes student doesn t smoke Total both parents smoke 400 1,380 1,780 one parent smokes 416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student & one parent smokes) = P(being in row #2 & column #1) 2, 1 entry = grand total = 416 5, 375 = 0.077 P(student smokes) = P(being in column #1) column #1 total 1, 004 = = grand total 5, 375 = 0.187 P(one parent smokes) = P(being in row #2) row #2 total 2, 239 = = grand total 5, 375 = 0.417. (University of New Haven) Goodness of Fit Tests 18 / 38

Tests of Independence Expected Counts for r c Contingency Tables Observe: Assuming H 0 : row variable and column variable are independent, e ij = (grand total) P(being in ij th cell) = (grand total) P(being in row #i) P(being in column #j) ( ) ( ) row #i total column #j total = (grand total) grand total grand total (row #i total) (column #j total) =. grand total (University of New Haven) Goodness of Fit Tests 19 / 38

Tests of Independence Example (cont.) student smokes student doesn t smoke Total both parents smoke 400 1,380 1,780 one parent smokes 416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 The expected counts of the six cells are: 1, 780 1, 004 1, 780 4, 371 e 11 = = 332.49 e 12 = = 1, 447.51 5, 375 5, 375 2, 239 1, 004 2, 239 4, 371 e 21 = = 418.22 e 22 = = 1, 820.48 5, 375 5, 375 1, 356 1, 004 1, 356 4, 371 e 31 = = 253.29 e 32 = = 1, 102.71 5, 375 5, 375 (University of New Haven) Goodness of Fit Tests 20 / 38

Tests of Independence Chi Squared Test for Two Way Tables Theorem (Chi Squared Test for Two Way Tables) The chi square statistic from a two way r c table, x def = r i=1 c j=1 (o ij e ij ) 2 e ij, measures how much the observed cell counts differ from the expected cell counts when holds. If H 0 is true and H 0: row variable and column variable are independent all expected counts are 1 no more than 20% of the expected counts are < 5. then the chi squared statistic is approximately χ 2 ((r 1)(c 1)). In that case, the p value of the test, H 0 versus H A : not H 0 is approximately P(x C) where C χ 2 ((r 1)(c 1)). (University of New Haven) Goodness of Fit Tests 21 / 38

Tests of Independence Example (cont.) Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students smoking habits). What does it tell you? Sample size? Hypotheses? Are the data ok for a χ 2 test? Interpretation? (University of New Haven) Goodness of Fit Tests 22 / 38

Tests of Independence Example (cont.) > row1=c(400,1380) > row2=c(416,1823) > row3=c(188,1168) > obs = rbind(row1,row2,row3) > chisq.test(obs) Pearson s Chi-squared test data: obs X-squared = 37.5663, df = 2, p-value = 6.959e-09 > exp=chisq.test(obs)$expected > exp [,1] [,2] row1 332.4874 1447.513 row2 418.2244 1820.776 row3 253.2882 1102.712 > (obs-exp)^2/exp [,1] [,2] row1 13.70862455 3.14881241 row2 0.01183057 0.00271743 row3 16.82884348 3.86551335 (University of New Haven) Goodness of Fit Tests 23 / 38

Tests of Independence Equivalence of Tests Consider a 2 2 two way table: bad driver good driver male 789 563 female 823 575 One can test whether being a bad/good driver has nothing to do with gender by 1 z test for comparing two proportions. 2 Goodness of Fit Chi Squared Test for Independence. Both ways are equivalent and will yield the same result. (University of New Haven) Goodness of Fit Tests 24 / 38

Test of Homogeneity Test of Homogeneity Test of Homogeneity (University of New Haven) Goodness of Fit Tests 25 / 38

Test of Homogeneity Test of Homogeneity (No Matched Pairs) Definition A test of homogeneity tests if two different populations have the same proportion of some trait, i.e., the corresponding 2 2 contingency table has independent row and column variables. Example Computer chips are manufactured at two different fab plants. Let n def = # computer chips j def = # defective m def = # from fab plant A X def = # defects from fab plant A Question: Does one of the fab plants have a greater chance of creating defects than the other? Consider Fab Plant A Fab Plant B Totals Defective X j X j Nondefective m X n m j + X n j Totals m n m n Notice that with n, m and j fixed, the inner four entries are determined solely by X. (University of New Haven) Goodness of Fit Tests 26 / 38

Test of Homogeneity Fisher s Exact Test (No Matched Pairs) Theorem (Fisher s Exact Test) Assume j of n objects are of Type A, the rest are of Type B. Given m of the n objects, one has the hypotheses, Test Statistic: H 0 : the m objects were chosen independent of type from the n objects, versus H 1 : not H 0. X = # of of Type A objects in the set of m objects. HYP(n, j, m) under H 0. Reject H 0 when X takes on extreme values in either tail. The model for X HYP(n, j, m), the hypergeometric distribution is X = # of defective items in a sample of m items chosen from an n items of which j are defective. Note: avoids using chi squared test for 2 by 2 case with small samples. One uses computer programs to calculate p values. (University of New Haven) Goodness of Fit Tests 27 / 38

Test of Homogeneity Example of Fisher s Exact Test Example A C. difficile experiment involved 29 patients with inflamed colons. Sixteen where given fecal implants (to introduce beneficial bacteria to the colon) and 13 were were treated with the antibiotic, vancomycin. There were 3 sick and 13 cured fecal transplant patients, and 9 sick and 4 cured vancomycin patients. fecal vancomycin sick 3 9 cured 13 4 Find the p value of H 0 : fecal/vancomycin is independent of sick/cured. Solution: Using R: > fisher.test(rbind(c(3,9),c(13,4))) Fisher s Exact Test for Count Data data: rbind(c(3, 9), c(13, 4)) p-value = 0.00953 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.0126885 0.7278730 sample estimates: odds ratio 0.1130106 (University of New Haven) Goodness of Fit Tests 28 / 38

Test of Homogeneity Example of Fisher s Exact Test (cont.) Example (cont.) One can also use the hypergeometric distribution. As extreme as 3 or more extreme. > phyper(3,16,13,12) [1] 0.008401063 The reason this does not match the p value R gave when using fisher.test is that the fisher.test was a two sided test and above only one extreme side was calculated. Since X HYP(29, 12, 16) is a discrete, non symmetric distribution, it is not trivial to measure the probability of going just as extreme, but big instead of small. A typical way of doing this is to add together the probabilities of all combinations that have lower probabilities than that of the observed data. (University of New Haven) Goodness of Fit Tests 29 / 38

Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints Suppose n voters are asked if they would vote for a candidate before a debate and then, again, after the debate. The 2 2 contingency table of the 2n unpaired votes is To test for independence of vote totals: Yes No Before a n a n After b n b n a + b 2n a b 2n H 0 : vote totals were not affected by debate H 1 : vote totals were affected by the debate versus using a χ 2 test with one degree of freedom. If the ratio of before yes votes to votes cast ( a n ) is similar to the ratio of after yes votes to votes cast ( b n ) the χ2 test will conclude the data is consistent with independence of before and after vote tallies. (University of New Haven) Goodness of Fit Tests 30 / 38

Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints A second way of thinking of the data is to consider the n paired votes of each of the n voters, (before yes/no, after yes/no). The before and after total vote tallies will remain as before (a and b will be considered fixed). After Yes No Before Yes x a x a No b x n + x b a n a b n b n Notice that given x, the above table is completely determined! Furthermore, the difference along the anti diagonal will be b a no matter what x is. Instead of testing H 0, one tests H 0 : a = b. In other words, the number of yes no voters equals the number of no yes voters the vote tallies for before and after are the same. (University of New Haven) Goodness of Fit Tests 31 / 38

Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints Hypothesis H 0 is that the contingency table above be symmetric, not that before/after and yes/no voting tallies be independent. Equivalently, and Yes After (University of New Haven) Goodness of Fit Tests 32 / 38 No Before Yes p 11 p 12 p 11 + p 12 No p 12 p 22 p 12 + p 22 p 11 + p 12 p 12 + p 22 1 H 0 : p 12 = p 21. Independence of the yes/no voting tally variable and the before/after variable is different than independence of the before and after votes of each voter. For instance, if every voter voted the same before and after the debate, then both H 0 and H 0 would hold, yet a n = b n so χ2 test for independence says the data is consistent with independence of before/after voting tallies, but the before and after votes of a voter would be as dependent as they possibly can be (one could predict the after debate vote of a voter knowing the voter s before debate vote).

Test of Homogeneity McNemar Test (Matched Pairs) McNemar Test (Matched Pairs) Theorem (McNemar s Test (Quinn McNemar, psychologist (1947))) Let (x 1, y 1 ),, (x n, y n) be a paired random sample where X BIN(1, p X ) and Y BIN(1, p Y ). Define b def = n j=1 For an approximate test x j = # of x j s that equal 1 and c def = n y j = # of y j s that equal 1. j=1 H 0 : frequencies of b and c occur in same proportion assume b + c 10 and use the test statistic c 2 = uses a right tail test. ( b c 1)2 b + c which is χ 2 (1) under H 0. One It is entirely possible for Fisher s Exact Test for independence results in an insignificant result, while McNemar s Test returns a significant result. McNemar s Test tests for symmetry about the diagonal in the contingency table, not independence. (University of New Haven) Goodness of Fit Tests 33 / 38

Test of Homogeneity McNemar Test (Matched Pairs) Example Suppose the softness or callousness of hands was tallied in the following table from randomly selected men. Right Hand Soft Callused Left Hand Soft 14 63 Callused 58 273 If a person is to have one soft and one calloused hand, is it equally likely that the callused hand be the right or left hand? Use Nemar s Test to get a p value. Solution: Here n = 14 + 63 + 58 + 273 = 408. Using McNemar s Test, c 2 = ( 63 58 1)2 = 16 63+58 121 = 0.1322. Since this is sampled from χ2 (1), one has a p value of 0.7161 and the test is insignificant. One can not reject the hypothesis that it is equally likely that if one has one callused hand and one soft hand, it is equally likely that the callused hand is your left hand instead of right hand. Notice, one can reorganize the data, losing the information of which left hand goes with which right hand, and obtain Soft Callused Right Hands 72 336 Left Hands 77 331. Fisher s Exact test produces an p value of 0.7171. One can not reject the hypothesis that handiness and callousness is independent. (University of New Haven) Goodness of Fit Tests 34 / 38

Test of Homogeneity McNemar Test (Matched Pairs) Example Notice that a chi square indep test instead of the Fisher s Exact Test yields a p value of 0.6505. The difference is because Fisher s Exact Test is exact, while the chi-squared indep test is approximate. > mcnemar.test(matrix(c(14,63, 58,273),nrow=2)) McNemar s Chi-squared test with continuity correction data: matrix(c(14, 63, 58, 273), nrow = 2) McNemar s chi-squared = 0.1322, df = 1, p-value = 0.7161 > chisq.test(matrix(c(72,336,77,331),nrow=2),correct=false) # no continuity correction Pearson s Chi-squared test data: matrix(c(72, 336, 77, 331), nrow = 2) X-squared = 0.2053, df = 1, p-value = 0.6505 > fisher.test(matrix(c(72,336,77,331),nrow=2)) Fisher s Exact Test for Count Data data: matrix(c(72, 336, 77, 331), nrow = 2) p-value = 0.7171 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.6350921 1.3351891 sample estimates: odds ratio 0.9212498 (University of New Haven) Goodness of Fit Tests 35 / 38

Chapter #9 R Assignment Chapter #9 R Assignment Chapter #10 R Assignment (University of New Haven) Goodness of Fit Tests 36 / 38

Chapter #9 R Assignment 1 A car expert claims that 30% of all cars in Johnstown are American made, 35% are Japanese made, 20% are Korean made and 15% are European. Of 156 cars randomly observed in Johnstown, 67 were American, 42 were Japanese, 24 were Korean and 23 were European. Find the p value of a goodness of fit test between the what was expected and what was observed. 2 Senie et al. (1981) investigated the relationship between age and frequency of breast self-examination in a sample of women (Senie, R. T., Rosen, P. P., Lesser, M. L., and Kinne, D. W. Breast self examinations and medical examination relating to breast cancer stage. American Journal of Public Health, 71, 583 590.) A summary of the results is presented in the following table: Frequency of breast self examination Age Monthly Occasionally Never under 45 91 90 51 45-59 150 200 155 60 and over 109 198 172 From Hand et al., page 307, table 368. Do an independence test to see if age and frequency of breast self examination are independent. (University of New Haven) Goodness of Fit Tests 37 / 38

Assignment Chapter #9 R Assignment 3 A particular gene sites in the common housefly is either deemed synonymous if they did not affect amino acids or were deemed replacement if they did. These sites were also deemed polymorphisms if varied among subspecies or were deemed fixed if they did not. The following data was collected: Synonymous Replacement polymorphisms 43 2 fixed 17 7 Find the p value of H 0 synonymous/replacement is independent of polymorphisms/fixed. (University of New Haven) Goodness of Fit Tests 38 / 38