Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Similar documents
11-2 Multinomial Experiment

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Example. χ 2 = Continued on the next page. All cells

Chapter 10: Chi-Square and F Distributions

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Chi Square Analysis M&M Statistics. Name Period Date

Module 10: Analysis of Categorical Data Statistics (OA3102)

Chapter 26: Comparing Counts (Chi Square)

Chi-square (χ 2 ) Tests

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chi-square (χ 2 ) Tests

Ling 289 Contingency Table Statistics

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Lecture 41 Sections Wed, Nov 12, 2008

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Discrete Multivariate Statistics

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Frequency Distribution Cross-Tabulation

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

13.1 Categorical Data and the Multinomial Experiment

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

10.2: The Chi Square Test for Goodness of Fit

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Summary of Chapters 7-9

2.3 Analysis of Categorical Data

Categorical Data Analysis. The data are often just counts of how many things each category has.

Statistics 3858 : Contingency Tables

Statistics for Managers Using Microsoft Excel

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Ch. 11 Inference for Distributions of Categorical Data

POLI 443 Applied Political Research

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Inference for Proportions, Variance and Standard Deviation

Chi-Square. Heibatollah Baghi, and Mastee Badii

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

Chapter 9 Inferences from Two Samples

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Statistics in medicine

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data

10.2 Hypothesis Testing with Two-Way Tables

15: CHI SQUARED TESTS

STP 226 ELEMENTARY STATISTICS NOTES

Testing Independence

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Inferential statistics

TUTORIAL 8 SOLUTIONS #

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Lecture 7: Hypothesis Testing and ANOVA

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Goodness of Fit Tests

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

Statistics Handbook. All statistical tables were computed by the author.

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

green green green/green green green yellow green/yellow green yellow green yellow/green green yellow yellow yellow/yellow yellow

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Confidence Intervals, Testing and ANOVA Summary

The material for categorical data follows Agresti closely.

Econ 325: Introduction to Empirical Economics

Basic Business Statistics, 10/e

Topic 21 Goodness of Fit

10: Crosstabs & Independent Proportions

Lecture 28 Chi-Square Analysis

Hypothesis Testing: Chi-Square Test 1

Tables Table A Table B Table C Table D Table E 675

STAC51: Categorical data Analysis

Mathematical Notation Math Introduction to Applied Statistics

How do we compare the relative performance among competing models?

Sociology 6Z03 Review II

Statistics 224 Solution key to EXAM 2 FALL 2007 Friday 11/2/07 Professor Michael Iltis (Lecture 2)

Macomb Community College Department of Mathematics. Review for the Math 1340 Final Exam

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Relate Attributes and Counts

Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.

Module 7 Practice problem and Homework answers

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

green green green/green green green yellow green/yellow green yellow green yellow/green green yellow yellow yellow/yellow yellow

Probability and Sample space

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Bag RED ORANGE GREEN YELLOW PURPLE Candies per Bag

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Hypothesis Testing hypothesis testing approach

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Module 5 Practice problem and Homework answers

Glossary for the Triola Statistics Series

Transcription:

Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables: Independence and Homogeneity 2 10-1 Overview Focus on analysis of categorical (qualitative or attribute) data that can be separated into different categories (often called cells) Use a X 2 (chi-square) test statistic and critical values from the Chi-Square Distribution (Table A-4) One-way frequency table (single row or column) Two-way way frequency table or contingency table (two or more rows and columns) 3

10-2 Multinomial Experiment 4 Definition Multinomial Experiment An experiment that meets the following conditions: 1. The number of trials is fixed. 2. The trials are independent. 3. All outcomes of each trial must be classified into exactly one of several different categories. 4. The probabilities for the different categories remain constant for each trial. 5 Definition of-fit fit test used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution 6

of-fit Test Notation 0 represents the observed frequency of an outcome E represents the expected frequency of an outcome k n represents the number of different categories or outcomes represents the total number of trials 7 Expected Frequencies If all expected frequencies are equal: E = n k the sum of all observed frequencies divided by the number of categories 8 Expected Frequencies If all expected frequencies are not all equal: E = n p each expected frequency is found by multiplying the sum of all observed frequencies by the probability for the category 9

Key Question Are the differences between the observed values (O) and the theoretically expected values (E) statistically significant? 10 Key Question We need to measure the discrepancy between O and E; the test statistic will involve their difference: O - E 11 Multinomial Experiments: of-fit Assumptions when testing hypothesis that the population proportion for each of the categories is as claimed: 1. The data have been randomly selected. 2. The sample data consist of frequency counts for each of the different categories. 3. The expected frequency is at least 5. (There is no requirement that the observed frequency for each category must be at least 5.) 12

Test Statistic X 2 = Σ (O - E)2 E Critical Values 1. Found in Table A-4 A 4 using k-1 k 1 degrees of freedom where k = number of categories 2. of-fit fit hypothesis tests are always right-tailed. tailed. 13 Multinomial Experiment: of-fit Test H 0 : No difference between observed and expected probabilities H 1 : at least one of the probabilities is different from the others 14 A close agreement between observed and expected values will lead to a small value of X 2 and a large P-value. A large disagreement between observed and expected values will lead to a large value of X 2 and a small P-value. A A significantly large value of Χ 2 will cause a rejection of the null hypothesis of no difference between the observed and the expected. 15

Relationships Among Components in of-fit Hypothesis Test Figure 10-3 16 Categories with Equal Frequencies H 0 : p 1 = p 2 = p =... = p 3 k H 1 (Probabilities) : at least one of the probabilities is different from the others 17 Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with equal proportions on the 5 workdays. Frequency of Accidents 18

Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with uniform distribution on the 5 workdays. Frequency of Accidents 19 Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with equal proportions on the 5 workdays. Frequency of Accidents Claim: Accidents occur with the same proportion (frequency); that is, p 1 = p 2 = p 3 = p 4 = p 5 H 0 : p 1 = p 2 = p 3 = p 4 = p 5 H 1 : At least 1 of the 5 proportions is different from others 20 Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with equal proportions on the 5 workdays. Frequency of Accidents E = n/k = 147/5 = 29.4 21

Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with equal proportions on the 5 workdays. Frequency of Accidents O: E: E = n/k = 147/5 = 29.4 Observed and Expected Frequencies Expected accidents 29.4 29.4 29.4 29.4 29.4 22 Multinomial Experiment of-fit fit Test Test Statistic X 2 = Σ (O - E)2 E 23 Observed and Expected Frequencies of Industrial Accidents Expected accidents 29.4 29.4 29.4 29.4 29.4 (O -E) 2 /E 0.0871 (O - E) 2 = (31-29.4) 2 = 0.0871 E 29.4 24

Observed and Expected Frequencies of Industrial Accidents Expected accidents 29.4 29.4 29.4 29.4 29.4 (O -E) 2 /E 0.0871 5.4000 4.4204 0.6585 0.0871 (rounded)( Test Statistic (O -E) 2 X 2 = Σ = 0.0871 + 5.4000 + 4.4204 + 0.6585 + 0.0871 E 0.0871 + 5.4000 + 4.4204 + 0.6585 + 0.0871 = 10.6531 25 Multinomial Experiments of-fit fit Test Critical Values 1. Found in Table A-4 A 4 using k-1 k 1 degrees of freedom where k = number of categories 2. of-fit fit hypothesis tests are always right-tailed. tailed. 26 Observed and Expected Frequencies of Industrial Accidents Expected accidents 29.4 29.4 29.4 29.4 29.4 (O -E) 2 /E 0.0871 5.4000 4.4204 0.6585 0.0871 (rounded)( Test Statistic: X 2 (O -E) = Σ 2 E = 0.0871 + 5.4000 + 4.4204 + 0.6585 + 0.0871 = 10.6531 Critical Value: X 2 = 9.488 Table A-4 A 4 with k-1 1 = 5-11 = 4 and α = 0.05 27

Fail to Reject p 1 = p 2 = p 3 = p 4 = p 5 Reject p 1 = p 2 = p 3 = p 4 = p 5 α = 0.05 0 X 2 = 9.488 Sample data: X 2 = 10.653 Test Statistic falls within the critical region: REJECT the null hypothesis Claim: Accidents occur with the same proportion (frequency); that is, p 1 = p 2 = p 3 = p 4 = p 5 H 0 : p 1 = p 2 = p 3 = p 4 = p 5 H 1 : At least 1 of the 5 proportions is different from others 28 Fail to Reject p 1 = p 2 = p 3 = p 4 = p 5 Reject p 1 = p 2 = p 3 = p 4 = p 5 α = 0.05 0 X 2 = 9.488 Sample data: X 2 = 10.653 Test Statistic falls within the critical region: REJECT the null hypothesis We reject claim that the accidents occur with equal proportions (frequency) on the 5 workdays. (Although it appears Wednesday has a lower accident rate, arriving at such a conclusion would require other methods of analysis.) 29 Categories with Unequal Frequencies (Probabilities) H 0 : p 1, p 2, p,..., p 3 k are as claimed H 1 : at least one of the above proportions is different from the claimed value 30

Example: Mars, Inc. claims its M&M candies are distributed with the color percentages of 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue. At the 0.05 significance level, test the claim that the color distribution is as claimed by Mars, Inc. Claim: p 1 = 0.30, p 2 = 0.20, p 3 = 0.20, p 4 = 0.10, p 5 = 0.10, p 6 = 0.10 H 0 : p 1 = 0.30, p 2 = 0.20, p 3 = 0.20, p 4 = 0.10, p 5 = 0.10, p 6 = 0.10 H 1 : At least one of the proportions is different from the claimed value. 31 Example: Mars, Inc. claims its M&M candies are distributed with the color percentages of 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue. At the 0.05 significance level, test the claim that the color distribution is as claimed by Mars, Inc. Frequencies of M&Ms Brown Yellow Red Orange Green Blue Observed frequency 33 26 21 8 7 5 n = 100 Brown E = np = (100)(0.30) = 30 Yellow E = np = (100)(0.20) = 20 Red E = np = (100)(0.20) = 20 Orange E = np = (100)(0.10) = 10 Green E = np = (100)(0.10) = 10 Blue E = np = (100)(0.10) = 10 32 Frequencies of M&Ms Brown Yellow Red Orange Green Blue Observed frequency 33 26 21 8 7 5 Expected frequency 30 20 20 10 10 10 33

Frequencies of M&Ms Brown Yellow Red Orange Green Blue Observed frequency 33 26 21 8 7 5 Expected frequency 30 20 20 10 10 10 (O -E) 2 /E 0.3 1.8 0.05 0.4 0.9 2.5 34 Frequencies of M&Ms Brown Yellow Red Orange Green Blue Observed frequency 33 26 21 8 7 5 Expected frequency 30 20 20 10 10 10 (O -E) 2 /E 0.3 1.8 0.05 0.4 0.9 2.5 Test Statistic (O - E) 2 E X 2 = Σ = 5.95 Critical Value X 2 =11.071 (with k-1 1 = 5 and α = 0.05) 35 Fail to Reject Reject α = 0.05 0 X 2 = 11.071 Sample data: X 2 = 5.95 Test Statistic does not fall within critical region; Fail to reject H 0 : percentages are as claimed There is not sufficient evidence to warrant rejection of the claim that the colors are distributed with the given percentages. 36

Comparison of Claimed and Observed Proportions 0.30 0.20 Proportions 0.10 0 Claimed proportions Yellow Observed proportions Orange Blue Brown Red Green 37