Goodness of Fit Tests

Size: px
Start display at page:

Download "Goodness of Fit Tests"

Transcription

1 Goodness of Fit Tests Marc H. Mehlman University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38

2 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of Independence 3 Test of Homogeneity McNemar Test (Matched Pairs) 4 Chapter #9 R Assignment (University of New Haven) Goodness of Fit Tests 2 / 38

3 Goodness of Fit Chi Squared Test Goodness of Fit Chi Squared Test Goodness of Fit Chi Squared Test (University of New Haven) Goodness of Fit Tests 3 / 38

4 Goodness of Fit Chi Squared Test Idea of the chi-square test The chi-square (χ 2 ) test is used when the data are categorical. It measures how different the observed data are from what we would expect if H 0 was true. Observed sample proportions (1 SRS of 700 births) Expected proportions under H 0 : p 1 =p 2 =p 3 =p 4 =p 5 =p 6 =p 7 =1/7 Sample composition 20% 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Expected composition 20% 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Sat. Sun. (University of New Haven) Goodness of Fit Tests 4 / 38

5 Goodness of Fit Chi Squared Test The chi-square distributions The χ 2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom. Published tables & software give the upper-tail area for critical values of many χ 2 distributions. (University of New Haven) Goodness of Fit Tests 5 / 38

6 Goodness of Fit Chi Squared Test Table D Ex: df = 6 If χ 2 = 15.9 the P-value is between p df (University of New Haven) Goodness of Fit Tests 6 / 38

7 Goodness of Fit Chi Squared Test Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n 1, n 2,, n k in k cells. Let H 0 specify the cell probabilities p 1, p 2,, p k for the k possible outcomes. Definition o j def = observed in cell j e j def = np j = expected in cell j Example Three species of large fish (A, B, C) that are native to a certain river have been observed to exist in equal proportions. A recent survey of 300 large fish found 89 of species A, 120 of species B and 91 of species C. What are the observed and expected counts? Solution: o 1 = 89, o 2 = 120 and o 3 = 91. ( ) 1 e 1 = e 2 = e 3 = np j = 300 = (University of New Haven) Goodness of Fit Tests 7 / 38

8 Goodness of Fit Chi Squared Test Chi Squared Goodness of Fit Test Theorem (Chi Squared Goodness of Fit Test) The chi square statistic, which measures how much the observed cell counts differ from the expected cell counts, is Let If H 0 is true and all expected counts are 1 x def = k j=1 (o j e j ) 2 e j. H 0 : the cell probabilities are p 1,, p k. no more than 20% of the expected counts are < 5. then the chi squared statistic is approximately χ 2 (k 1). In that case, the p value of the test H 0 versus H A : not H 0 is approximately P(x C) where C χ 2 (k 1). (University of New Haven) Goodness of Fit Tests 8 / 38

9 Goodness of Fit Chi Squared Test Example River ecology Three species of large fish (A, B, C) that are native to a certain river have been observed to co-exist in equal proportions. A recent random sample of 300 large fish found 89 of species A, 120 of species B, and 91 of species C. Do the data provide evidence that the river s ecosystem has been upset? H 0 : p A = p B = p C = 1/3 H a : H 0 is not true Number of proportions compared: k = 3 All the expected counts are : n / k = 300 / 3 = 100 Degrees of freedom: (k 1) = 3 1 = 2 X 2 calculations: 2 2 ( ) ( ) ( ) 2 χ = = = (University of New Haven) Goodness of Fit Tests 9 / 38

10 Goodness of Fit Chi Squared Test Example (cont.) If H 0 was true, how likely would it be to find by chance a discrepancy between observed and expected frequencies yielding a X 2 value of 6.02 or greater? From Table E, we find 5.99 < X 2 < 7.38, so 0.05 > P > Software gives P-value = Using a typical significance level of 5%, we conclude that the results are significant. We have found evidence that the 3 fish populations are not currently equally represented in this ecosystem (P < 0.05). (University of New Haven) Goodness of Fit Tests 10 / 38

11 Goodness of Fit Chi Squared Test Example (cont.) Interpreting the χ 2 output The individual values summed in the χ 2 statistic are the χ 2 components. When the test is statistically significant, the largest components indicate which condition(s) are most different from the expected H 0. You can also compare the actual proportions qualitatively in a graph. Percent of total. 40% 30% 20% 10% 0% A B C gumpies sticklebarbs spotheads 2 2 ( ) ( ) ( ) 2 χ = = = 6.02 The largest X 2 component, 4.0, is for species B. The increase in species B contributes the most to significance (University of New Haven) Goodness of Fit Tests 11 / 38

12 Goodness of Fit Chi Squared Test Example Goodness of fit for a genetic model Under a genetic model of dominant epistasis, a cross of white and yellow summer squash will yield white, yellow, and green squash with probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1). Suppose we observe the following data: Are they consistent with the genetic model? H 0 : p white = 12/16; p yellow = 3/16; p green = 1/16 H a : H 0 is not true We use H 0 to compute the expected counts for each squash type. (University of New Haven) Goodness of Fit Tests 12 / 38

13 Goodness of Fit Chi Squared Test Example (cont.) We then compute the chi-square statistic: 2 χ = 2 2 ( ) ( ) ( ) χ = Degrees of freedom = k 1 = 2, and X 2 = Using Table D we find P > Software gives P = This is not significant and we fail to reject H 0. The observed data are consistent with a dominant epistatic genetic model (12:3:1). The small observed deviations from the model could simply have arisen from the random sampling process alone. (University of New Haven) Goodness of Fit Tests 13 / 38

14 Goodness of Fit Chi Squared Test Example (cont.) > obs=c(155,40,10) > tprob=c(12/16, 3/16, 1/16) > chisq.test(obs,p=tprob) Chi-squared test for given probabilities data: obs X-squared = , df = 2, p-value = > exp=chisq.test(obs,p=tprob)$expected > exp [1] > (obs-exp)^2/exp [1] (University of New Haven) Goodness of Fit Tests 14 / 38

15 Tests of Independence Tests of Independence Tests of Independence (University of New Haven) Goodness of Fit Tests 15 / 38

16 Tests of Independence r c Contingency Tables Given two different finite partitions of the population, namely {A i } r i=1 and {B j } c j=1. One wants to test if the two partitions are independent: H 0 : P(A i B j ) = P(A i )P(B j ) for every 1 i r and 1 j c versus H A : not H 0. One takes a random sample, x 1,, x n, from the population. Let def o ij = the number of x j s that fall in A i B J and def r def c C j = o ij and R i = o ij. i=1 The data for the test of independence is given in a r c contingency table: (University of New Haven) Goodness of Fit Tests 16 / 38 j=1 B 1 B 2 B c Row Totals A 1 o 11 o 12 o 1C R 1 A 2 o 21 o 22 o 2C R 2... A r o R1 o R2 o RC R r Column Totals C 1 C 2 C c Grand Total = n. The name contingency table was given by Karl Pearson...

17 Tests of Independence Example Two-way tables An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design. High school students were asked whether they smoke, and whether their parents smoke: Second factor: Student smoking status First factor: Parent smoking status (University of New Haven) Goodness of Fit Tests 17 / 38

18 Tests of Independence Example (cont.) student smokes student doesn t smoke Total both parents smoke 400 1,380 1,780 one parent smokes 416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student & one parent smokes) = P(being in row #2 & column #1) 2, 1 entry = grand total = 416 5, 375 = P(student smokes) = P(being in column #1) column #1 total 1, 004 = = grand total 5, 375 = P(one parent smokes) = P(being in row #2) row #2 total 2, 239 = = grand total 5, 375 = (University of New Haven) Goodness of Fit Tests 18 / 38

19 Tests of Independence Expected Counts for r c Contingency Tables Observe: Assuming H 0 : row variable and column variable are independent, e ij = (grand total) P(being in ij th cell) = (grand total) P(being in row #i) P(being in column #j) ( ) ( ) row #i total column #j total = (grand total) grand total grand total (row #i total) (column #j total) =. grand total (University of New Haven) Goodness of Fit Tests 19 / 38

20 Tests of Independence Example (cont.) student smokes student doesn t smoke Total both parents smoke 400 1,380 1,780 one parent smokes 416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 The expected counts of the six cells are: 1, 780 1, 004 1, 780 4, 371 e 11 = = e 12 = = 1, , 375 5, 375 2, 239 1, 004 2, 239 4, 371 e 21 = = e 22 = = 1, , 375 5, 375 1, 356 1, 004 1, 356 4, 371 e 31 = = e 32 = = 1, , 375 5, 375 (University of New Haven) Goodness of Fit Tests 20 / 38

21 Tests of Independence Chi Squared Test for Two Way Tables Theorem (Chi Squared Test for Two Way Tables) The chi square statistic from a two way r c table, x def = r i=1 c j=1 (o ij e ij ) 2 e ij, measures how much the observed cell counts differ from the expected cell counts when holds. If H 0 is true and H 0: row variable and column variable are independent all expected counts are 1 no more than 20% of the expected counts are < 5. then the chi squared statistic is approximately χ 2 ((r 1)(c 1)). In that case, the p value of the test, H 0 versus H A : not H 0 is approximately P(x C) where C χ 2 ((r 1)(c 1)). (University of New Haven) Goodness of Fit Tests 21 / 38

22 Tests of Independence Example (cont.) Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students smoking habits). What does it tell you? Sample size? Hypotheses? Are the data ok for a χ 2 test? Interpretation? (University of New Haven) Goodness of Fit Tests 22 / 38

23 Tests of Independence Example (cont.) > row1=c(400,1380) > row2=c(416,1823) > row3=c(188,1168) > obs = rbind(row1,row2,row3) > chisq.test(obs) Pearson s Chi-squared test data: obs X-squared = , df = 2, p-value = 6.959e-09 > exp=chisq.test(obs)$expected > exp [,1] [,2] row row row > (obs-exp)^2/exp [,1] [,2] row row row (University of New Haven) Goodness of Fit Tests 23 / 38

24 Tests of Independence Equivalence of Tests Consider a 2 2 two way table: bad driver good driver male female One can test whether being a bad/good driver has nothing to do with gender by 1 z test for comparing two proportions. 2 Goodness of Fit Chi Squared Test for Independence. Both ways are equivalent and will yield the same result. (University of New Haven) Goodness of Fit Tests 24 / 38

25 Test of Homogeneity Test of Homogeneity Test of Homogeneity (University of New Haven) Goodness of Fit Tests 25 / 38

26 Test of Homogeneity Test of Homogeneity (No Matched Pairs) Definition A test of homogeneity tests if two different populations have the same proportion of some trait, i.e., the corresponding 2 2 contingency table has independent row and column variables. Example Computer chips are manufactured at two different fab plants. Let n def = # computer chips j def = # defective m def = # from fab plant A X def = # defects from fab plant A Question: Does one of the fab plants have a greater chance of creating defects than the other? Consider Fab Plant A Fab Plant B Totals Defective X j X j Nondefective m X n m j + X n j Totals m n m n Notice that with n, m and j fixed, the inner four entries are determined solely by X. (University of New Haven) Goodness of Fit Tests 26 / 38

27 Test of Homogeneity Fisher s Exact Test (No Matched Pairs) Theorem (Fisher s Exact Test) Assume j of n objects are of Type A, the rest are of Type B. Given m of the n objects, one has the hypotheses, Test Statistic: H 0 : the m objects were chosen independent of type from the n objects, versus H 1 : not H 0. X = # of of Type A objects in the set of m objects. HYP(n, j, m) under H 0. Reject H 0 when X takes on extreme values in either tail. The model for X HYP(n, j, m), the hypergeometric distribution is X = # of defective items in a sample of m items chosen from an n items of which j are defective. Note: avoids using chi squared test for 2 by 2 case with small samples. One uses computer programs to calculate p values. (University of New Haven) Goodness of Fit Tests 27 / 38

28 Test of Homogeneity Example of Fisher s Exact Test Example A C. difficile experiment involved 29 patients with inflamed colons. Sixteen where given fecal implants (to introduce beneficial bacteria to the colon) and 13 were were treated with the antibiotic, vancomycin. There were 3 sick and 13 cured fecal transplant patients, and 9 sick and 4 cured vancomycin patients. fecal vancomycin sick 3 9 cured 13 4 Find the p value of H 0 : fecal/vancomycin is independent of sick/cured. Solution: Using R: > fisher.test(rbind(c(3,9),c(13,4))) Fisher s Exact Test for Count Data data: rbind(c(3, 9), c(13, 4)) p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio (University of New Haven) Goodness of Fit Tests 28 / 38

29 Test of Homogeneity Example of Fisher s Exact Test (cont.) Example (cont.) One can also use the hypergeometric distribution. As extreme as 3 or more extreme. > phyper(3,16,13,12) [1] The reason this does not match the p value R gave when using fisher.test is that the fisher.test was a two sided test and above only one extreme side was calculated. Since X HYP(29, 12, 16) is a discrete, non symmetric distribution, it is not trivial to measure the probability of going just as extreme, but big instead of small. A typical way of doing this is to add together the probabilities of all combinations that have lower probabilities than that of the observed data. (University of New Haven) Goodness of Fit Tests 29 / 38

30 Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints Suppose n voters are asked if they would vote for a candidate before a debate and then, again, after the debate. The 2 2 contingency table of the 2n unpaired votes is To test for independence of vote totals: Yes No Before a n a n After b n b n a + b 2n a b 2n H 0 : vote totals were not affected by debate H 1 : vote totals were affected by the debate versus using a χ 2 test with one degree of freedom. If the ratio of before yes votes to votes cast ( a n ) is similar to the ratio of after yes votes to votes cast ( b n ) the χ2 test will conclude the data is consistent with independence of before and after vote tallies. (University of New Haven) Goodness of Fit Tests 30 / 38

31 Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints A second way of thinking of the data is to consider the n paired votes of each of the n voters, (before yes/no, after yes/no). The before and after total vote tallies will remain as before (a and b will be considered fixed). After Yes No Before Yes x a x a No b x n + x b a n a b n b n Notice that given x, the above table is completely determined! Furthermore, the difference along the anti diagonal will be b a no matter what x is. Instead of testing H 0, one tests H 0 : a = b. In other words, the number of yes no voters equals the number of no yes voters the vote tallies for before and after are the same. (University of New Haven) Goodness of Fit Tests 31 / 38

32 Test of Homogeneity McNemar Test (Matched Pairs) Contingency Tables: Two Viewpoints Hypothesis H 0 is that the contingency table above be symmetric, not that before/after and yes/no voting tallies be independent. Equivalently, and Yes After (University of New Haven) Goodness of Fit Tests 32 / 38 No Before Yes p 11 p 12 p 11 + p 12 No p 12 p 22 p 12 + p 22 p 11 + p 12 p 12 + p 22 1 H 0 : p 12 = p 21. Independence of the yes/no voting tally variable and the before/after variable is different than independence of the before and after votes of each voter. For instance, if every voter voted the same before and after the debate, then both H 0 and H 0 would hold, yet a n = b n so χ2 test for independence says the data is consistent with independence of before/after voting tallies, but the before and after votes of a voter would be as dependent as they possibly can be (one could predict the after debate vote of a voter knowing the voter s before debate vote).

33 Test of Homogeneity McNemar Test (Matched Pairs) McNemar Test (Matched Pairs) Theorem (McNemar s Test (Quinn McNemar, psychologist (1947))) Let (x 1, y 1 ),, (x n, y n) be a paired random sample where X BIN(1, p X ) and Y BIN(1, p Y ). Define b def = n j=1 For an approximate test x j = # of x j s that equal 1 and c def = n y j = # of y j s that equal 1. j=1 H 0 : frequencies of b and c occur in same proportion assume b + c 10 and use the test statistic c 2 = uses a right tail test. ( b c 1)2 b + c which is χ 2 (1) under H 0. One It is entirely possible for Fisher s Exact Test for independence results in an insignificant result, while McNemar s Test returns a significant result. McNemar s Test tests for symmetry about the diagonal in the contingency table, not independence. (University of New Haven) Goodness of Fit Tests 33 / 38

34 Test of Homogeneity McNemar Test (Matched Pairs) Example Suppose the softness or callousness of hands was tallied in the following table from randomly selected men. Right Hand Soft Callused Left Hand Soft Callused If a person is to have one soft and one calloused hand, is it equally likely that the callused hand be the right or left hand? Use Nemar s Test to get a p value. Solution: Here n = = 408. Using McNemar s Test, c 2 = ( )2 = = Since this is sampled from χ2 (1), one has a p value of and the test is insignificant. One can not reject the hypothesis that it is equally likely that if one has one callused hand and one soft hand, it is equally likely that the callused hand is your left hand instead of right hand. Notice, one can reorganize the data, losing the information of which left hand goes with which right hand, and obtain Soft Callused Right Hands Left Hands Fisher s Exact test produces an p value of One can not reject the hypothesis that handiness and callousness is independent. (University of New Haven) Goodness of Fit Tests 34 / 38

35 Test of Homogeneity McNemar Test (Matched Pairs) Example Notice that a chi square indep test instead of the Fisher s Exact Test yields a p value of The difference is because Fisher s Exact Test is exact, while the chi-squared indep test is approximate. > mcnemar.test(matrix(c(14,63, 58,273),nrow=2)) McNemar s Chi-squared test with continuity correction data: matrix(c(14, 63, 58, 273), nrow = 2) McNemar s chi-squared = , df = 1, p-value = > chisq.test(matrix(c(72,336,77,331),nrow=2),correct=false) # no continuity correction Pearson s Chi-squared test data: matrix(c(72, 336, 77, 331), nrow = 2) X-squared = , df = 1, p-value = > fisher.test(matrix(c(72,336,77,331),nrow=2)) Fisher s Exact Test for Count Data data: matrix(c(72, 336, 77, 331), nrow = 2) p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio (University of New Haven) Goodness of Fit Tests 35 / 38

36 Chapter #9 R Assignment Chapter #9 R Assignment Chapter #10 R Assignment (University of New Haven) Goodness of Fit Tests 36 / 38

37 Chapter #9 R Assignment 1 A car expert claims that 30% of all cars in Johnstown are American made, 35% are Japanese made, 20% are Korean made and 15% are European. Of 156 cars randomly observed in Johnstown, 67 were American, 42 were Japanese, 24 were Korean and 23 were European. Find the p value of a goodness of fit test between the what was expected and what was observed. 2 Senie et al. (1981) investigated the relationship between age and frequency of breast self-examination in a sample of women (Senie, R. T., Rosen, P. P., Lesser, M. L., and Kinne, D. W. Breast self examinations and medical examination relating to breast cancer stage. American Journal of Public Health, 71, ) A summary of the results is presented in the following table: Frequency of breast self examination Age Monthly Occasionally Never under and over From Hand et al., page 307, table 368. Do an independence test to see if age and frequency of breast self examination are independent. (University of New Haven) Goodness of Fit Tests 37 / 38

38 Assignment Chapter #9 R Assignment 3 A particular gene sites in the common housefly is either deemed synonymous if they did not affect amino acids or were deemed replacement if they did. These sites were also deemed polymorphisms if varied among subspecies or were deemed fixed if they did not. The following data was collected: Synonymous Replacement polymorphisms 43 2 fixed 17 7 Find the p value of H 0 synonymous/replacement is independent of polymorphisms/fixed. (University of New Haven) Goodness of Fit Tests 38 / 38

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Chapter 10: Chi-Square and F Distributions

Chapter 10: Chi-Square and F Distributions Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

Lecture 28 Chi-Square Analysis

Lecture 28 Chi-Square Analysis Lecture 28 STAT 225 Introduction to Probability Models April 23, 2014 Whitney Huang Purdue University 28.1 χ 2 test for For a given contingency table, we want to test if two have a relationship or not

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

:the actual population proportion are equal to the hypothesized sample proportions 2. H a AP Statistics Chapter 14 Chi- Square Distribution Procedures I. Chi- Square Distribution ( χ 2 ) The chi- square test is used when comparing categorical data or multiple proportions. a. Family of only

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies. I. T or F. (1 points each) 1. The χ -distribution is symmetric. F. The χ may be negative, zero, or positive F 3. The chi-square distribution is skewed to the right. T 4. The observed frequency of a cell

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

Psych 230. Psychological Measurement and Statistics

Psych 230. Psychological Measurement and Statistics Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State

More information

Inferential statistics

Inferential statistics Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Lecture 45 Sections Wed, Nov 19, 2008

Lecture 45 Sections Wed, Nov 19, 2008 The Lecture 45 Sections 14.5 Hampden-Sydney College Wed, Nov 19, 2008 Outline The 1 2 3 The 4 5 The Exercise 14.20, page 949. A certain job in a car assembly plant involves a great deal of stress. A study

More information

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups Sections 10-1 & 10-2 Independent Groups It is common to compare two groups, and do a hypothesis

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Statistics - Lecture 04

Statistics - Lecture 04 Statistics - Lecture 04 Nicodème Paul Faculté de médecine, Université de Strasbourg file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 1/40 Correlation In many situations the objective

More information

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data Chapters 9 and 10 Review for Exam 1 Chapter 9 Correlation and Regression 2 Overview Paired Data is there a relationship if so, what is the equation use the equation for prediction 3 Definition Correlation

More information

MA : Introductory Probability

MA : Introductory Probability MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr. Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

POLI 443 Applied Political Research

POLI 443 Applied Political Research POLI 443 Applied Political Research Session 6: Tests of Hypotheses Contingency Analysis Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness

More information

Chi-Square Analyses Stat 251

Chi-Square Analyses Stat 251 Chi-Square Analyses Stat 251 While we have analyses for comparing more than 2 means, we cannot use them when trying to compare more than one proportion. However, there is a distribution that is related

More information

Hypothesis Testing: Chi-Square Test 1

Hypothesis Testing: Chi-Square Test 1 Hypothesis Testing: Chi-Square Test 1 November 9, 2017 1 HMS, 2017, v1.0 Chapter References Diez: Chapter 6.3 Navidi, Chapter 6.10 Chapter References 2 Chi-square Distributions Let X 1, X 2,... X n be

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Inferences About Two Proportions

Inferences About Two Proportions Inferences About Two Proportions Quantitative Methods II Plan for Today Sampling two populations Confidence intervals for differences of two proportions Testing the difference of proportions Examples 1

More information

10.4 Hypothesis Testing: Two Independent Samples Proportion

10.4 Hypothesis Testing: Two Independent Samples Proportion 10.4 Hypothesis Testing: Two Independent Samples Proportion Example 3: Smoking cigarettes has been known to cause cancer and other ailments. One politician believes that a higher tax should be imposed

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing

More information

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015 AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers Nominal Data Greg C Elvers 1 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics A parametric statistic is a statistic that makes certain

More information

Example. χ 2 = Continued on the next page. All cells

Example. χ 2 = Continued on the next page. All cells Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t t Confidence Interval for Population Mean Comparing z and t Confidence Intervals When neither z nor t Applies

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2 Chapters 4-6: Inference with two samples Read sections 45, 5, 53, 6 COMPARING TWO POPULATION MEANS When presented with two samples that you wish to compare, there are two possibilities: I independent samples

More information

15: CHI SQUARED TESTS

15: CHI SQUARED TESTS 15: CHI SQUARED ESS MULIPLE CHOICE QUESIONS In the following multiple choice questions, please circle the correct answer. 1. Which statistical technique is appropriate when we describe a single population

More information

The t-statistic. Student s t Test

The t-statistic. Student s t Test The t-statistic 1 Student s t Test When the population standard deviation is not known, you cannot use a z score hypothesis test Use Student s t test instead Student s t, or t test is, conceptually, very

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

Quantitative Analysis and Empirical Methods

Quantitative Analysis and Empirical Methods Hypothesis testing Sciences Po, Paris, CEE / LIEPP Introduction Hypotheses Procedure of hypothesis testing Two-tailed and one-tailed tests Statistical tests with categorical variables A hypothesis A testable

More information

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression Ch. 3: Correlation & Relationships between variables Scatterplots Exercise Correlation Race / DNA Review Why numbers? Distribution & Graphs : Histogram Central Tendency Mean (SD) The Central Limit Theorem

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence

ST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds

More information

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) 1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness

More information

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla Outline for Today 1 Review of In-class Exercise 2 Bivariate hypothesis testing 2: difference of means 3 Bivariate hypothesis testing 3: correlation 2 / 51 Task for ext Week Any questions? 3 / 51 In-class

More information

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010 Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010 Instructor Name Time Limit: 120 minutes Any calculator is okay. Necessary tables and formulas are attached to the back of the exam.

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

10: Crosstabs & Independent Proportions

10: Crosstabs & Independent Proportions 10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church

More information

STP 226 ELEMENTARY STATISTICS NOTES

STP 226 ELEMENTARY STATISTICS NOTES STP 226 ELEMENTARY STATISTICS NOTES PART 1V INFERENTIAL STATISTICS CHAPTER 12 CHI SQUARE PROCEDURES 12.1 The Chi Square Distribution A variable has a chi square distribution if the shape of its distribution

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

Inference for Proportions

Inference for Proportions Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: rare events happen but not to me. Marc Mehlman (University of New Haven) Inference for

More information

Chapter Six: Two Independent Samples Methods 1/51

Chapter Six: Two Independent Samples Methods 1/51 Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were

More information

Goodness of Fit Tests: Homogeneity

Goodness of Fit Tests: Homogeneity Goodness of Fit Tests: Homogeneity Mathematics 47: Lecture 35 Dan Sloughter Furman University May 11, 2006 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, 2006 1 / 13 Testing

More information

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X. Math 10B with Professor Stankova Worksheet, Midterm #2; Wednesday, 3/21/2018 GSI name: Roy Zhao 1 Problems 1.1 Bayes Theorem 1. Suppose a test is 99% accurate and 1% of people have a disease. What is the

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

UNIT 5 ~ Probability: What Are the Chances? 1

UNIT 5 ~ Probability: What Are the Chances? 1 UNIT 5 ~ Probability: What Are the Chances? 1 6.1: Simulation Simulation: The of chance behavior, based on a that accurately reflects the phenomenon under consideration. (ex 1) Suppose we are interested

More information

Chapter Eight: Assessment of Relationships 1/42

Chapter Eight: Assessment of Relationships 1/42 Chapter Eight: Assessment of Relationships 1/42 8.1 Introduction 2/42 Background This chapter deals, primarily, with two topics. The Pearson product-moment correlation coefficient. The chi-square test

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Experiment -- the process by which an observation is made. Sample Space -- ( S) the collection of ALL possible outcomes of an experiment

Experiment -- the process by which an observation is made. Sample Space -- ( S) the collection of ALL possible outcomes of an experiment A. 1 Elementary Probability Set Theory Experiment -- the process by which an observation is made Ex. Outcome The result of a chance experiment. Ex. Sample Space -- ( S) the collection of ALL possible outcomes

More information

10.2: The Chi Square Test for Goodness of Fit

10.2: The Chi Square Test for Goodness of Fit 10.2: The Chi Square Test for Goodness of Fit We can perform a hypothesis test to determine whether the distribution of a single categorical variable is following a proposed distribution. We call this

More information

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts Statistical methods for comparing multiple groups Lecture 7: ANOVA Sandy Eckel seckel@jhsph.edu 30 April 2008 Continuous data: comparing multiple means Analysis of variance Binary data: comparing multiple

More information

MATH Notebook 3 Spring 2018

MATH Notebook 3 Spring 2018 MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................

More information

Analysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands

Analysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands Analysis of categorical data S4 Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands m.hauptmann@nki.nl 1 Categorical data One-way contingency table = frequency table Frequency (%)

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

Chapter 11. Hypothesis Testing (II)

Chapter 11. Hypothesis Testing (II) Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information