Ch. 11 Inference for Distributions of Categorical Data

Similar documents
:the actual population proportion are equal to the hypothesized sample proportions 2. H a

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

10.2 Hypothesis Testing with Two-Way Tables

Is Yawning Contagious video

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lab #12: Exam 3 Review Key

Statistics for Managers Using Microsoft Excel

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

11-2 Multinomial Experiment

Chapter 26: Comparing Counts (Chi Square)

10.2: The Chi Square Test for Goodness of Fit

Lecture 28 Chi-Square Analysis

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Psych 230. Psychological Measurement and Statistics

Inferential statistics

Chi Square Analysis M&M Statistics. Name Period Date

Math 152. Rumbos Fall Solutions to Exam #2

Average weight of Eisenhower dollar: 23 grams

STAT Chapter 8: Hypothesis Tests

The Chi-Square Distributions

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Example. χ 2 = Continued on the next page. All cells

Topic 21 Goodness of Fit

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Testing Research and Statistical Hypotheses

Study Ch. 13.1, # 1 4 all Study Ch. 13.2, # 9 15, 25, 27, 31 [# 11 17, ~27, 29, ~33]

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

STA Module 10 Comparing Two Proportions

Mathematical Notation Math Introduction to Applied Statistics

Quantitative Analysis and Empirical Methods

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

Ch. 7. One sample hypothesis tests for µ and σ

10: Crosstabs & Independent Proportions

Inferential Statistics

HYPOTHESIS TESTING. Hypothesis Testing

Bag RED ORANGE GREEN YELLOW PURPLE Candies per Bag

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Lecture 10: Generalized likelihood ratio test

Chapter 10: Chi-Square and F Distributions

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Lecture 7: Hypothesis Testing and ANOVA

79 Wyner Math Academy I Spring 2016

Chi-Square Analyses Stat 251

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Chapter 11 - Lecture 1 Single Factor ANOVA

Statistical Analysis How do we know if it works? Group workbook: Cartoon from XKCD.com. Subscribe!

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

The Chi-Square Distributions

Non-parametric (Distribution-free) approaches p188 CN

Two Factor ANOVA. March 2, 2017

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Using Tables and Graphing Calculators in Math 11

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

15: CHI SQUARED TESTS

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

a. Do you think the function is linear or non-linear? Explain using what you know about powers of variables.

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Difference between means - t-test /25

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Discrete Multivariate Statistics

Ch Inference for Linear Regression

Chapter 9 Inferences from Two Samples

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

1. (25) Suppose that a bag of Skittles contains 20 Reds, 25 Yellows, 22 Purples, 15 Oranges and 18 Greens.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Review of Statistics 101

Test statistic P value Reject/fail to reject. Conclusion:

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

First we look at some terms to be used in this section.

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Testing a Claim about the Difference in 2 Population Means Independent Samples. (there is no difference in Population Means µ 1 µ 2 = 0) against

An inferential procedure to use sample data to understand a population Procedures

EXAM 3 Math 1342 Elementary Statistics 6-7

Lesson Plan for Santa Rita Experimental Range Vegetation Monitoring Martha Gebhardt, Outreach Coordinator

Basic Business Statistics, 10/e

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Sleep data, two drugs Ch13.xls

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

Statistics 3858 : Contingency Tables

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Section 4.6 Simple Linear Regression

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Position and Displacement

Comparing Means from Two-Sample

Transcription:

Ch. 11 Inference for Distributions of Categorical Data CH. 11 2 INFERENCES FOR RELATIONSHIPS

The two sample z procedures from Ch. 10 allowed us to compare proportions of successes in two populations or for two treatments. What if we want to compare the distributions of a single categorical variable across several populations or treatments? For this new test, we use two-way tables to present the data.

2 populations 1 categorical variable 3 categories 20 30 50 50 50 100 Are we looking at row totals or column totals? row totals 20 Not much 100 = 20% 50 1+ per day 100 = 50% 30 1+ per week 100 = 30% Grand Total

20 30 50 50 50 100 Are we looking at 1 row or 1 column? 1 column (Granada column) 4 Not much 50 = 8% 1+ per week 1+ per day 16 50 = 32% 30 50 = 60%

20 30 50 50 50 100 30% of ECRCHS is expected to use Facebook 1+ per week 30 100 50 row total column total grand total (# of rows 1)(# of columns 1) = r 1 c 1 = 15

In Ch. 11-1, we used a χ 2 GOF test the claimed distribution of a categorical variable. No. We are not comparing a sample distribution to a claimed distribution. We are comparing a sample distribution to another sample distribution.

row 1, column 1: 20 50 100 = 10 (10) (15) (10) (15) 20 30 (25) (25) 50 50 50 100 State: H 0 : H a : There is no difference in the distribution of Facebook habits between ECRCHS and Granada. There is some difference in the distribution of Facebook habits between ECRCHS and Granada. α = 0.05

When comparing a sample distribution to another sample distribution, we use the Plan: χ 2 test of homogeneity Random: Large Sample Size: random sample from each high school All expected counts are at least 5. (10, 10, 15, 15, 25, 25) Independent: Two things to check: 1) Both samples or groups need to be independent of each other. 2) Individual observations in each sample or group have to be independent. When sampling without replacement for both samples, must check 10% condition for both. We clearly have two independent samples one from each school. There must be at least 10 50 and Granada. = 500 students at both ECRCHS

Do: χ 2 distribution, df = 2 df = r 1 c 1 = (3 1)(2 1) = 2 χ 2 = O E 2 E 9.34 16 10 2 = + 4 10 2 14 15 2 + + 10 10 15 = 3.6 + 3.6 + 0.07 + 0.07 + 1 + 1 χ 2 = 9.34 χ 2 cdf lower bound, upper bound, df = χ 2 cdf 9.34, 99999, 2 =.0094 p-value

Conclude: Assuming H 0 is true (there is no difference in the distribution of Facebook habits between ECRCHS and Granada), there is a 0.0094 probability of getting a χ 2 value of 9.34 or more purely by chance. This provides strong evidence against H 0 and is statistically significant at α = 0.05 level (.0094 <.05). Therefore, we reject H 0 and can conclude that there is some difference in Facebook habits between ECRCHS and Granada. The largest component of χ 2 is 3.6 because the number of ECRCHS and Granada students who don t go on Facebook much was higher than expected and lower than expected, respectively. one one one 2+

Just by looking at the data, what do you think the p-value will be? Totals 29.8 20.2 59.6 40.4 59.6 40.4 149 101 Totals 50 100 100 250 Not appropriate to round expected counts to whole numbers State: H 0 : H a : There is no difference in the success rates for the three test preparation strategies. There is a difference in the success rates for the three test preparation strategies. α = 0.05

Plan: χ 2 test of homogeneity Random: random sample of 149 students who had passed the exam and separate sample of 101 students who did not pass the exam Large Sample Size: All expected counts are at least 5. (29.8, 59.6, 59.6, 20.2, 40.4, 40.4) Independent: Independent samples were taken. There must be at least 10 149 = 1490 students who have passed the AP Stats exam and at least 10 101 = 1010 students who did not.

Do: χ 2 distribution, df = 2 df = r 1 c 1 = (3 1)(2 1) = 2 χ 2 = O E 2 E 175.286 40 29.8 2 99 59.6 2 10 59.6 2 = + + + 29.8 59.6 59.6 = 3.49 + 5.15 + 26.05 + 38.43 + 41.28 + 60.9 χ 2 = 175.286 You can use χ 2 GOF-Test to get the contribution values quickly, but don t say you used χ 2 GOF-Test for a test of homogeneity. χ 2 cdf lower bound, upper bound, df = χ 2 cdf 175.286, 9999, 2 = 0 p-value

Conclude: Assuming H 0 is true (there is no difference in the success rates for the three test preparation strategies), there is a 0 probability of getting a χ 2 value of 175.286 or more purely by chance. This provides very strong evidence against H 0 and is statistically significant at α = 0.05 level (0 <.05). Therefore, we reject H 0 and can conclude that there is a difference in success rates for the three types of test preparations. The largest component of χ 2 is 60.9 because the number of students who didn t pass the exam with no review was much higher than expected.

What if we have a single random sample from a single population that s classified according to two categorical variables, and our goal is to see if the two categorical variables have a relationship/association? New Test! Why can t we use χ 2 GOF? There s more than one categorical variable. Why can t we use χ 2 Homogeneity? There s one population and more than one categorical variable.

Totals two categorical variables 27.8 25.3 29.1 23.7 38.4 34.9 40.1 32.7 21.8 19.8 22.8 18.6 88 80 92 75 Totals 106 146 83 335 State: H 0 : H a : H 0 : H a : There is no association between the math class and sport played for high school students. There is some association between the math class and sport played for high school students. OR Math class and sport played are independent in the population of high school students. Math class and sport played are not independent in the population of high school students. α = 0.05

Plan: χ 2 test of association/independence Random: Large Sample Size: random sample of 335 high school students All expected counts are at least 5. The lowest expected count is 18.6. (see table) Independent: One thing to check: Individual observations in the sample or group have to be independent. When sampling without replacement, must check 10% condition for both. There must be at least 10 335 = 3350 high school students in the USA that play a sport and take a math class.

Do: χ 2 distribution, df = 6 df = r 1 c 1 = (3 1)(4 1) = 6 χ 2 = 28.96 You can use χ 2 GOF-Test to get the contribution values quickly, but don t say you used χ 2 GOF-Test for a test of homogeneity. O E 2 35 27.8 2 42 38.4 2 11 21.8 2 = + + + E 27.8 38.4 21.8 = 1.86 +.34 + 5.35 + 2.99 + 2.81 +.07 + 10.05 + 2.44 + 2.27 +.07 +.42 +.31 χ 2 = 28.96 χ 2 cdf lower bound, upper bound, df = χ 2 cdf 28.96, 9999, 6 = 0 p-value

Conclude: Assuming H 0 is true (there is no association between math class and sport played for HS students), there is about a 0 probability of getting a χ 2 value of 28.96 or more purely by chance. This provides strong evidence against H 0 and is statistically significant at α = 0.05 level 0 <.05. Therefore, we reject H 0 and can conclude that there is some association between math class and sport played. The largest component of χ 2 is 10.05 because the number of Geometry students who play football is much less than expected.

1 1 Skittles problem. Tests the null hypothesis that a categorical variable has a claimed distribution. 1 2 or more Facebook habits at ECRCHS vs Granada Comparing the distribution of one categorical variable in two or more populations. 2 1 Math class vs sport played Investigating the relationship between two categorical variables in one population.

Is there an association between resemblance and dog breed? Totals 12.78 12.22 25 Totals 10.22 9.78 20 23 22 45.053 χ 2 test of association/independence χ 2 = 3.73

Does the data give convincing evidence of a difference in resemblance and an owner s choice in dog breed? p 1 = 16 Two-proportion z test (two-sided) 25 =.64.053 z = 1.934 the same.053 p 2 = 7 20 =.35 This only works for two-sided two proportion z tests. z 2 = χ 2 1.934 2 = 3.73